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I .   INTRODUCTION 

A.   Image  Processing  in  Automated  Cytology 

Images  of  biological  cells  are  being  increasingly  used  in  quanti- 
tative studies  of  cell  properties.   New  techniques  have  become  available 
which  elucidate  subtle  cell  structures  and  functions.   Procedures  such 
as  stoichiometric  staining,  fluorescent  or  radioactive  tagging  with 
antibodies  or  chemical  precursors,  and  autoradiography  produce 
images  whose  optical  properties  can  be  used  to  analyze  complex  biological 
events.   However,  conversion  of  optical  properties  to  numeric  form  and 
subsequent  interpretation  become    problems,  especially  when  large  cell 
populations  are  involved.   Information  in  biological  images  is  typically 
rather  "noisy,"  owing  to  the  heterogeneity  and  variation  inherent  in 
such  materials.   Simple  automatic  approaches  to  data  extraction  are 
sensitive  to  this  noise.   Frequently,  morphological  considerations  are 
needed  to  limit  the  domain  of  measurement  to  particular  structures  to 
avoid  spurious  signals  [97,  120,  191 »  26l].   Data  conversion  and  inter- 
pretation by  humans,  on  the  other  hand,  can  cope  with  the  noise  problem, 
but  are  not  cost-effective  for  some  applications  of  interest,  even 
when  augmented  by  interactive  computation  facilities. 

What  is  needed,  then,  is  the  development  of  cost-effective  image- 
processing  algorithms  which  can  cope  with  morphological  criteria  at  the 
level  of  complexity  present  in  biological  materials.   These  algorithms 
can  produce  some  measurements  directly,  or  can  be  used  in  conjunction 
with  other  analytical  instruments.   For  example,  an  image-processing 
algorithm  could  be  used  to  locate  a  cell  nucleus,  whereupon  an  electron 
micro  probe  could  be  automatically  directed  to  perform  a  chemical  analysis, 
Conversely,  image-processing  could  be  used  to  automatically  apply 
quantitative  adaptations  of  human-oriented  morphological  criteria  to 
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the  output  of  existing  high-speed  cell  analyzers.   Flow  and  electro- 
static particle  transport  systems  have  the  capacity  to  handle,  analyze 
and  even  sort  large  numbers  of  cells,  often  as  many  as  100,000  per 
minute.   (See  section  III.B.2.)   These  devices  typically  apply  measure- 
ments, such  as  light  absorption  at  a  particular  wavelength,  to  an  entire 
cell  at  once  with  no  attempt  to  resolve  intracellular  structure.   Some 
applications  would  benefit  from  the  ability  to  rapidly  filter  large 
quantities  of  cells  using  high-speed  whole-object  measurements  followed 
by  automatic  morphological  analysis  of  the  residue. 

B.   The  Subject  and  Scope  of  this  Thesis 

In  this  thesis  a  particular  application  of  digital  image  processing 
in  cytology  is  examined:   automated  analysis  of  the  well-known  "Pap" 
smear.   Pap  smears  are  samples  of  epithelial  (skin)  cells  from  the 
uterine  cervix  and  are  used  to  detect  cancer  of  the  uterus  while  it  is 
in  an  early  stage  and  relatively  easy  to  cure.   Development  of  an  auto- 
matic device  to  screen  these  samples  has  been  recognized  as  of  great 
potential  value,  almost  from  the  time  the  test  was  first  devised  by 
Papanicolaou.   However,  this  has  proven  to  be  a  formidable  task,  as  demon- 
strated by  unsuccessful  development  projects  undertaken  by  such  organi- 
zations as  the  National  Cancer  Institute,  IBM  Corporation,  and  Vickers, 
Ltd.   Accordingly,  the  goals  of  this  thesis  are  prudently  restricted 
to  something  less  than  a  complete  implementation.   Actual  implementation 
of  a  system  to  analyze  cervical  smears  would  require  a  careful  analysis 
of  the  options  available  in  staining,  specimen  preparation,  transport 
and  sensing  to  determine  the  most  effective  combination.   It  seems 
likely  that  much  deeper  insights  into  the  nature  of  cancer  cells  will 
be  forthcoming  in  the  next  few  years.   One  would  expect  that  this  new 
knowledge  will  have  a  significant  effect  on  the  technological  direction. 
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As  is  discussed  in  some  detail  in  the  next  chapter,  a  cervical 
smear  consists  of  a  mixed  population  of  cells  spread  on  a  microscope 
slide  and  stained.   The  cells  have  a  tendency  to  clump  together,  fold, 
and  otherwise  present  a  very  heterogenous  and  confusing  picture  to  an 
observer,  whether  human  or  machine  (see  Figure  2).   This  clumping  and 
overlapping  will  frequently  be  a  problem  whenever  loose  cells  are  to 
be  examined,  even  if  special  preparative  techniques  are  used  to  disperse 
them.   The  fact  is  that  many  cell  types  are  supposed  to  stick  together, 
and  treatment  of  the  sample  to  defeat  this  may  result  in  the  loss  of 
important  information. 

In  the  present  work,  the  prime  objective  was  to  examine  the 
effectiveness  of  a  parallel  digital  image  processor  in  the  analysis  of 
cervical  smear  imagery.   A  wide  range  of  activities  could  support  this 
purpose.   In  order  to  define  the  problem  adequately  for  machine  imple- 
mentation, and  also  to  avoid  expending  excessive  time  on  a  complex 
biomedical  problem,  it  was  decided  to  accept  as  correct  certain  con- 
clusions drawn  in  the  course  of  development  of  the  Cytoanalyzer  project, 
an  early  attempt  to  automate  the  screening  of  cervical  smears  by  image 
processing.   The  Cytoanalyzer  project  is  discussed  more  thoroughly  in 
chapter  III.   The  pertinent  conclusions,  paraphrased  from  references 
[206]  and  [207]  are  as  follows: 

(1)  Normal  cells  shed  from  the  cervix  and  vagina  exhibit  a 
functional  relationship  between  size  and  optical  density  of  the  nucleus 
that  approximates  e(n)  =  Q/D(n)**2  where  e(n)  is  nuclear  optical  density 
and  D(n)  is  nuclear  diameter  suggesting  that  the  nucleus  contains  a 
constant  quantity,  Q,  of  dye-binding  material  in  normal  cells  throughout 
various  stages  of  differentiation. 

(2)  There  is  a  continuous  spectrum  of  change  toward  increased 
optical  density  of  the  nucleus  and  increased  nuclear  size  (relative  to 


overall  cell  size)  as  one  progresses  from  normal  cells  to  cancer  cells. 

(3)   Smears  classified  as  "being  associated  with  cancer  show  the 
presence  of  a  second  population  of  aberrant  cells  superimposed  on  the 
normal  cell  population. 

These  conclusions  were  based  on  the  use  of  a  standard  Papanicolaou- 
prepared  smear,  and  were  qualified  by  the  statement: 

"The  application  of  these  methods  to  an  automatic  instrument 
implies  the  condition  that  the  instrument  make  no  errors  in  measurement 
or  cell  recognition."   [206,  p.  468] 

As  is  described  in  Chapter  III,  a  prototype  instrument  incor- 
porating these  principles  was  built  which  was  intended  to  measure  nucleus 
optical  density  and  nucleus  diameter  of  epithelial  cells.   Due  to  limi- 
tations in  the  image  processing  technology  then  available,  it  proved  to 
be  impossible  to  meet  the  conditions  of  the  qualifying  statement.   The 
prototype  could  not  distinguish  enlarged,  dark,  epithelial  nuclei  from 
clumps  of  white  "blood  cells  nor  distinguish  white  blood  cells  from 
certain  normal  epithelial  cell  nuclei.   Thus,  the  validity  of  the  con- 
clusions reached  in  the  design  study  could  not  really  be  tested  in  a 
clinical  environment  because  of  the  presence  of  this  biological  image 


"noise." 


The  experiments  reported  in  this  thesis  were  motivated  by  the 
assumption  that  improved  image  processing  technology,  as  represented 
by  a  parallel  digital  image  processing  device  (described  briefly  in 
Chapter  IV)  will  permit  reliable  analysis  of  much  more  complex  patterns 
of  cell  images.   The  emphasis  here  is  not  on  the  discovery  and  extrac- 
tion of  parameters  for  distinguishing  a  malignant  cell  from  a  non- 
malignant  cell,  but  rather  on  the  construction  of  algorithms  for  a 
parallel  processor  which  permit  the  machine  to  rapidly  make  sense  out 
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of  the  mess  of  cells  and  debris  in  the  microscope  field  so  that  sub- 
sequent measurements  (whatever  they  might  be)  are  made  on  the  correct 
objects:   epithelial  cell  nuclei,  for  example,  rather  than  clumps  of 
white  blood  cells,  cytoplasmic  folds  or  other  locally  similar- appearing 
phenomena. 

The  interest  in  algorithms  for  a  parallel  image  processor  stems 
from  the  belief  that  for  this  and  similar  applications,  parallel  digital 
image  processing  has  the  greatest  likelihood  of  being  sufficiently  fast, 
flexible  and  cost  effective  to  perform  the  critical  initial  steps  of 
object  location  and  identification. 

The  remainder  of  this  thesis  consists  of  two  parts.   The  first, 
Chapters  II  and  III,  contains  background  information  about  the  application 
and  a  review  of  the  rather  extensive  literature  on  automated  cytology.   The 
second  part,  Chapters  IV-VI,  describes  some  experiments  in  applying 
parallel  image-processing  to  some  of  the  cervical  image  difficulties  mentioned 
above.   The  objective  was  to  write  and  test  programs  for  hardware  like  the 
Pattern  Articulation  Unit  of  Illiac  III,  where  the  programs  could  cope  with 
three  particular  aspects  of  the  analysis  of  cervical  images: 

(l)   Rapid  filtering  of  images  of  minimal  photometric  and  spatial 
resolution  to  detect  dark  blob-like  regions  representing  potential  malignant 
cell  nuclei.   The  Cytoanalyzer  study  mentioned  above  provides  the  justifica- 
tion for  searching  for  large,  symmetric  dark  regions.   The  constraint  to  use 
images  of  low  resolution  is  in  keeping  with  the  realities  of  scanner  per- 
formance in  situations  where  high  speed  is  required.   The  filtering  procedure 
is  designed  to  distinguish  between  dark  blobs  caused  by  symmetric  clumps  of 
leukocytes  and  blobs  caused  by  other  objects,  including  malignant  cells. 


(2)  Detection  and  counting  of  normal,  well-differentiated  epi- 
thelial cells.   This  is  again  a  filtering  operation  to  "be  done  on  a  low- 
resolution  image,  and  is  useful  in  cervical  smear  analysis  as  an  indica- 
tion of  cell  sample  adequacy  and  as  an  input  to  a  stopping  rule. 

(3)  Textural  discrimination  between  cell  nuclei  and  blobs  caused 
by  drying  artifacts.   After  the  blob  filter  isolates  areas  of  potential 
interest,  blobs  caused  by  spurious  phenomena  such  as  drying  artifact  can 
be  eliminated  by  texture  or  spectral  analysis  on  a  higher  resolution 
image.   Data  rate  requirements  are  much  reduced  by  rescanning  at  higher 
resolution  only  areas  which  require  higher  resolution  to  clear  up  ambiguities. 

While  there  are  many  other  aspects  of  cervical  cell  image  analysis 
which  could  be  of  interest,  it  is  felt  that  these  three  are  particularly 
compelling,  since  previous  attempts  to  handle  them  were  unsuccessful 
because  of  the  state  of  image  processing  technology.   By  capitalizing  on 
the  recently- available  ability  to  make  better  use  of  two-dimensional  infor- 
mation at  high  speeds,  it  is  felt  that  these  difficulties  are  solvable. 


II.   CERVICAL  SMEARS 
A.   Purpose  and  Use  of  Cervical  Cell  Sample 

The  epithelial  cells  of  the  body  cover  and  line  organs,  providing 
protection  in  some  cases  and  secreting  or  absorbing  vital  fluids  in 
others.   Since  epithelial  cells  are  subject  to  various  stresses,  they 
are  renewed  and  sloughed  off  (exfoliated)  continuously.   [63,  p.  189- 
2l8].   In  some  cases,  exfoliated  cells  are  carried  by  fluids  to  locations 
in  the  body  where  they  can  be  easily  sampled  (sputum,  urine).   These 
phenomena  provide  a  painless  and  low-cost  way  of  obtaining 
a  pseudo-biopsy  of  tissues  which  may  otherwise  be  accessible  only  by 
surgery. 

The  epithelium  of  the  uterine  cervix,  being  relatively  accessible, 
can  be  sampled  more  directly  by  scraping.   Under  the  present,  manual 
system  the  cells  are  spread  on  a  microscope  slide,  fixed,  and  stained 
according  to  procedures  described  by  G.  N.  Papanicolaou  in  the  early 
19U0's   [156-I58,  2^U],   An  appropriately-trained  person  determines 
whether  the  sample  contains  tumor  cells  or  cells  from  "pre-cancerous" 
conditions  by  looking  for  certain  features  of  individual  cells,  and  by 
looking  at  the  overall  pattern  of  cells  on  the  slide.   This  task  is 
commonly  broken  into  two  parts:   a  preliminary  screening  to  eliminate 
indisputably  normal  slides,  and  then  a  close  scrutiny  of  the  remaining 
slides  by  a  pathologist.   The  screening  phase  is  usually  delegated  to 
a  cytotechnologist ,  a  technician  who  has  undergone  at  least  six  months 
of  formal  training  in  cytotechnology  followed  by  another  half-year  of 
on-the-job  training.   Some  parameters  which  describe  the  effectiveness 
and  load  on  this  system  are  presented  in  Table  1,  which  is  a  compilation 
of  information  from  several  sources. 


Table  1.   Some  Parameters  Characterizing  the  Present 
Cervical  Smear  Screening  System 


Number  of  deaths  due  to  uterine  cancer  (US, 

Number  of  slides  processed  (US) 

Throughput  capacity  per  screener 

Time  to  screen  one  slide 

Cost  per  slide 

Number  of  cytotechnologists  (US) 

Percent  incidence,  various  abnormalities 
Dysplasia 
Carcinoma  in  situ 
Invasive  cancer 

Probability  of  missed  positive 


ik   thousand/year  [l] 
15  million/year  [279" 
10  slides /hour 
2-5  minutes  [165] 
$1.00-$3.00  [165] 
4500  [68] 


,5-. 8  [173] 
3-.h  [173] 
.1    [173] 


.1  [U7,110] 


While  the  primary  purpose  of  taking  Pap  smears  is  to  detect 

cervical  cancer  at  a  stage  when  it  is  easily  controlled,  other  information 

is  frequently  obtained  from  the  cell  sample: 

The  quantity  and  types  of  cells  present  indicate 

whether  the  cell  sample  is  adequate.   For  example,  a 

sample  consisting  entirely  of  white  blood  cells  was 

probably  not  made  correctly. 

Various  micro-organisms  such  as  yeasts,  bacteria  and 

trichomonads  may  be  identified. 

The  degree  of  differentiation  of  the  epithelial  cells 

present  gives  an  accurate  indication  of  the  status  of 

hormonal  activity.   This  is  useful  in  estimating  the 

effect  of  hormone  therapy  or  detecting  ovarian  cancer 

in  post-menopausal  patients. 

B.   Image  Characteristics  of  the  Cell  Sample 

1.   Variations  in  Epithelial  Cell  Morphology  Due  to  Differentiation 

Figure  1  shows  on  the  left  a  cross-section  through  epithelial 

tissue  like  that  present  on  the  outer  (vaginal)  part  of  the  uterine 

cervix.   New  cells  are  continuously  being  generated  in  the  bottom  (basal) 


cell  layer.   In  normal  epithelium,  as  the  cells  differentiate,  they 
become  more  and  more  flattened  (squamous),  and  the  nucleus  shrinks. 
At  the  top  (superficial)  stratum,  the  cells  are  relatively  easy  to  dis- 
lodge and  can  be  harvested  for  examination  by  non-traumatic  scraping 
or  wiping.   On  the  right  in  Figure  1  are  top  and  side  views  of  repre- 
sentative cells  from  various  strata.   Figure  2  is  a  photomicrograph  of 
a  Pap  smear.   Intermediate  and  parabasal  cells  are  seen.   The  small 
dark  blobs  scattered  throughout  the  field  are  white  blood 
cells  (leukocytes). 

2.   Model  of  the  Cell  Sample 

As  mentioned  in  the  Introduction,  the  appearance  of  the  cell 
sample  in  standard  preparations  is  heterogeneous  and  confusing.   To 
help  sort  this  out,  a  model  as  depicted  in  Figures  3-5  was  contrived. 
This  model  applies  to  mixtures  of  white  blood  cells  (leukocytes)  and 
to  fairly  well-differentiated  squamous  epithelial  cells  only,  and  shows 
some  of  the  situations  which  occur.   In  this  model,  an  idealized  epi- 
thelial cell  is  postulated.   Since  the  Papanicolaou  staining  process 
does  not  elucidate  a  living  cell's  true  complexity,  the  cell  can  be 
regarded  as  having  the  parts  shown  in  Figure  3.   [63,  6k], 

Two  classes  of  transformations  act  upon  these  cells  to  affect 
the  image  appearance.   Various  biological  and  mechanical  forces  can 
cause  changes  in  the  appearance  of  individual  cells,  as  shown  in  Figure 
h.      In  addition,  interactions  between  cells  produce  the  classes  of  cell 
images  shown  in  Figure  5.   Combinations  of  individual  cell  changes  and 
cell  interaction  effects  can  produce  complex  images  such  as  Figure  6. 
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Figure  1.      Cross-section  of  Stratified  Squamous  Epithelium  (left) 
and  Views  of  Representative  Cells 
(Redrawn  from   [2^5]) 
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Figure  2.   Photomicrograph  of  Epithelial  Cells 
and  White  Blood  Cells 
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Figure  3.   Cell  Structure 
as  Delineated  by  Papanicolaou  Stain 
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Figure  5.   Squamous  Cell  Interactions 
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Figure  6.   Photomicrograph  of  Cervical 
Smear  Showing  Cell  Interactions 
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C.   Characterizations  of  Malignant  Cells 

A  number  of  characterizations  of  malignant  cells  have  been  used 
or  proposed  for  cancer  cell  detection  systems;  however,  no  characteri- 
zation has  yet  been  discovered  which  unequivocally  applies  always  to 
cancer  cells  and  never  to  normal  cells  [271].   As  knowledge  of  cell 
structure  and  function  deepens  to  the  molecular  level  [6k,   218],  it 
becomes  more  likely  that  cancer  cells  can  be  characterized  precisely 
in  terms  of  their  biochemical  differences  from  normal  cells.   Once  these 
differences  are  well-understood,  development  of  instrumentation  for 
accurate  identification  of  cancer  can  be  expected  to  follow  rapidly. 
However,  for  the  present,  morphological  parameters  interpreted  by  the 
trained  human  visual  system  provide  the  only  acceptable  cervical  cell 
analysis  system  [22^]  for  cancer  screening  purposes. 

The  rest  of  this  chapter  contains  a  review  of  some  of  the  cri- 
teria which  have  been  proposed  to  distinguish  cancer  cells  from  normal 
cells.   In  some  cases,  machines  have  been  built  to  make  the  necessary 
measurements,  although  none  has  yet  been  accepted  for  clinical  use. 
1.   Morphological  Characterizations 

Under  the  present  manual  systems,  cell  samples  are  characterized 
in  morphological  terms;  that  is,  in  terms  of  form  and  structure  as 
commonly  interpreted  by  processes  of  human  visual  perception.   The  form 
and  structure  evident  in  Papanicolaou-prepared  cervical  smears  is  a 
consequence  of  the  light- absorbing  properties  of  the  various  stains 
used,  and  the  differential  uptake  of  those  stains  by  the  compounds  com- 
prising the  cell  substructures.   In  addition,  there  is  a  certain  amount 
of  modulation  of  the  transmitted  light  by  the  cell  material  itself,  as 
evidenced  by  its  unstained  appearance.   Morphological  identification 
of  cancer  cells  is  based  on  several  criteria  [52,  110,  156,  158,  173, 
213,  2U5]: 
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•Cytoplasmic  shape 

•Nuclear  area  and  shape 

•Nucleus  optical  density 

•Nuclear  chromatin  texture 

•Ratio  of  nuclear  to  cytoplasm  area 

The  cell-to-cell  variability  of  these  parameters  is  also  considered  to 
be  highly  informative  in  the  sense  that  in  cell  samples  from  normal 
tissue,  one  cell  looks  very  much  like  its  neighbors.   In  cancer,  this 
regularity  and  predictability  is  upset.   For  this  reason,  among  others, 
it  is  frequently  stated  that  the  cytologist  does  not  diagnose  individ- 
ual cells,  but  rather  interprets  the  entire  slide. 

It  must  be  pointed  out  that  listing  these  criteria  gives  a  false 
impression  of  the  objectivity  of  the  definition  and  use  of  these  factors, 
as  indicated  by  Spriggs  [271]  in  a  discussion  of  cytologic  diagnosticians: 

"if  you  want  a  simile,  we  are  more  like  art  connoisseurs, 
whose  opinions  are  built  up  after  years  of  studying  the 
styles  of  different  masters.   The  art  expert  cannot  get 
much  information  from  exact  measurements,  but  he  can 
identify  the  work  of  many  different  hands  and  he  knows 
a  lot  about  imitations.   To  the  end  of  his  life  he  is 
sometimes  wrong,  and  nobody  knows  how  often." 

Some  efforts  have  been  made  to  quantify  these  impressionistic 

parameters.   Wied  et.al.  [2U6]  and  Reagen,  et.al.  [173]  measured  large 

numbers  of  atypical  cells  in  smears  from  patients  with  various  forms  of 

malignant  and  non-malignant  conditions.   Their  results  are  presented 

in  an  abridged  form  in  Table  2.   Various  conditions  are  listed  across 

the  top.   Dysplasia  is  a  non-malignant  disruption  of  orderly  growth  and 

a  loss  of  differentiation  which  occurs  in  either  stratified  squamous 

epithelium  or  in  the  mis-differentiated  epithelium  of  metaplasia  [173], 

and  is  characterized  by  cells  with  "sufficient"  cytoplasm  and  much  enlarged 

nuclei  which  have  a  fine-texture  chromatin.   Carcinoma  in  situ  is  an 

early  form  of  cancer  which  eventually  may  go  on  to  invade  surrounding 

tissues.   It  is  characterized,  according  to  Table  2,  by  relatively 
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small  cells  (compared  with  normal  superficial  cells)  having  large  nuclei 
with  coarsely  clumped  chromatin.   The  various  forms  of  invasive  cancer 
have  increased  ratios  of  nuclear  area  to  cell  area,  and  coarsely  granu- 
lar or  opaque,  dark  chromatin.   In  these  two  studies,  no  attempt  was 
made  to  establish  the  discriminating  power  of  the  features.   The  objec- 
tive seems  rather  to  have  been  to  confirm  and  objectify  existing  subjec- 
tive criteria  used  in  visual  screening,  without  using  the  numbers  to 
analyze  the  criteria. 

Another  study  was  conducted  by  Tolles,  Horath  and  Bostrom  [206,207] 
to  establish  a  set  of  features  to  be  measured  by  the  Cytoanalyzer  screen- 
ing instrument.   In  this  study  an  attempt  was  made  to  determine  effective- 
ness of  certain  morphological  features  in  detecting  cancer.   This  research 
concerned  three  measurements:   cell  diameter,  nucleus  diameter  and 
nucleus  optical  density.   Two  hundred  cells  were  selected  at  random  from 
each  of  over  seven  hundred  slides.   All  isolated  cells  except  white 
blood  cells  were  measured  for  the  three  parameters  by  use  of  a  manual 
microphotometer.   The  conclusions  reached  by  Tolles,  et.al.  are  described 
above  in  Section  I.B.  and  are  in  basic  agreement  with  Wied  and  Reagen. 

Note  that  chromatin  texture  as  a  feature  is  ignored  by  Tolles 
et.al.,  even  though  it  is  considered  to  have  great  discriminating  power 
by  most  authorities.   [l,  52,  110,  173,  2^5 ].   This  may  have  been  because 
texture  analysis  at  high  speed  was  not  then  technically  feasible.   At 
any  rate,  as  shown  in  [207],  a  screening  performance  of  approximately 
zero  false  negatives  (cancer  smears  called  normal)  vs.  15%   false  posi- 
tives (normal  smears  called  abnormal)  could  be  expected  for  premeno- 
pausal women  sampled  by  cervical  scrape,  using  nucleus  optical  density 
and  nucleus  diameter  as  features.   To  obtain  this  performance,  a  smear 
was  to  be  called  abnormal  if  more  than  1.5%   of  cells  had  optical  density 


19 


>> 

• 

& 

* N 

is 

co 

* — ^ 

r-\ 

H 

i — i 

CD 

vo 

o 

-=t 

CM 

H 

L.  |      1 

d 

i 

tJ 

Jh 

CD 

O 

•H 

a 

^ 

,g 

< 

* 

^ -s 

a 

« 

O 



co 

, — , 

-p 

OO 

a 

t— 

a) 

H 

B 

a; 

^ 

C 

p) 

CD 

ra 

bO 

cd 

cd 

0) 

cd 

2 

« 

CM 


■3 


O    LTN 

ON   H 

LTN  ON 

CO  H 

LT\  LT\ 

H 

g* 
p 

+   1 

+  1 

O 

c— 

s 

o 

m 

O  K 

VO 

s 

H 

H 

H 

cd 

ON  t— 

LTN   CO 

On 

00 

O 

CO 

3 

VO    OO 

VO   H 

on 

H 

OO 

K    CO 

H 
+  1 

4  1 

C 

•H 

+3 

cd 

*H 

CD 

1 

1 

VO  ON 

CO    O 

.si- 

o 

vo 

00 

o 

LTN  VO 

CO    CO 

on 

CM 

t— 

«  s 

CM 
+  1 

+  i 

1 

bC 

£ 

•H 

IS] 

■H 

a 

•H 

•p 

aJ 

LTN  t— 

t—co 

On 

CM 

On 

On 

m 

C—  O 

t—  CM 

CM 

CM 

UA 

H 

CD 

CM   H 

«  W 

+  1 

+  1 

pi 

-P 

IS 

H   CM 

vo  vo 

H 

o\ 

H 

■H 

CO   H 

H   CM 

on 

vo 

00 

w 

on  h 
+  1 

H 
+  1 

S3 

■H 

novo 

o\  on 

CM  NO 

CM 

CO 

en 

LTN 

CM 

CM 

t— 

• 

LTN  H. 

O    CO 

00 

C— 

CM 

CM 

CM 

t— 

cd   « 

on  h 

H 

O 

+  l 

+  1 

+  1 

CM 

_3-    LTN 

t— 

cd 

ir\  C\ 

t—  00 

H 

•H  ^ 

o\  en 

H 

co 

o\ 

+  1 

cd 

+  1 

H 

ft 

CO  H 

t—  CO 

VO    00 

H 

ON 

ON 

O 

-* 

^t 

CM 

CO 

co  H 

vo  oo 

H  -3" 

CO 

H 

H 

ON 

>>  K 

o  oo 

H 

Q 

r-H-  1 

+  1 

+  1 

^ N 

«— . 

W, 

TiS. 

1 — ' 

y -s 

N ^ 

^ 

CD 

^ — N 

^— ^ 

h 

Pi 

VS. 

cd 

cr 

S 

H 

cd 

^-v 

-— - 

• — x 

p) 

ft 

CM 

CM 

a 

VS. 

H 

§ 

o 

=1 

3- 

CD 

^ — ■% 

v_^ 

^ — *. 

pi 

cd 

>^_^ 

>v_^ 

^ 

tsS. 

feS. 

§ 

k 

-p 

cd 

CO 

^~-^ 

^_^ 

cd 

bO 

a 

o3 

cd 

CD 

is**. 

U 

CD 

0 

CD 

o 

TJ 

-P 

' 

cd 

bO 

>> 

o 

[3 

1h 

pi 

CD 

cd 

■H 

H 

pi 

g3 

S 

-P 

bO 

CO 

-p 

!>> 

CD 

H 

cd 

CD 

■p 

o 

H 

CO 

co 

H 

• 

• 

H 

U 

CD 

CD 

Si 

fl 

H 

o 

H 

O 

bO 

CD 

CO 

a 

cd 

CD 

pi 

CD 

CO 

bO 

& 

■H 

o 

U 

O 

s 

K 

H 

< 

CD 

ES( 

o 

EH 

y — 

sIT90  T"eiuaouqv 
jo   q-UsuisSuBaav 


J\. 


8jmq.X3jJ 
utq-Biuoaqo 


20 
greater  than  .5  and  nucleus  diameter  greater  than  10  microns.   Per- 
formance with  these  features  was  shown  to  be  very  sensitive  to  sampling 
technique,  with  scraping  producing  the  aforementioned  excellent  per- 
formance and  vaginal  aspiration  resulting  in  over  65%   false  positives 
for  5%   false  negatives.  .Vaginal  aspiration  is  known  to  be  very  poor 
at  harvesting  cells  from  squamous  cell  cancer,  especially  carcinoma 
in  situ  [l,  p.  57].   Actual  performance  realized  by  the  Cytoanalyzer  is 
discussed  in  the  next  chapter. 

Quantification  of  chromatin  texture  analysis  has  been  somewhat 
more  difficult,  since  there  are  no  universally  understood  measurements 
of  texture.   The  very  subjective  texture  characterizations  of  Table  2 
illustrate  this.   Some  progress  has  been  made  toward  this  goal,  however, 
via  experiments  with  computer-oriented  texture  measurements  [15»  125 s 
166,  172,  187].   Some  of  these  algorithms  are  discussed  in  Chapter  III. 

The  parameters  of  cell-to-cell  variability  and  cell  arrangement 
have  apparently  not  been  subjected  to  objective  analysis  beyond  that 
of  Table  2. 

2.   Non-Morphological  Characterizations 

This  section  is  included  to  provide  context  for  the  discussions 
of  morphology  and  to  provide  background  for  the  section  on  non-morphologi- 
cal measurements  (III.C.2.).   No  claim  is  made  for  completeness  or 
currency. 

Much  is  being  learned  about  the  processes  of  malignancy  which 
very  probably  will  be  applicable  to  automated  cancer  detection.   Some 
of  these  insights  may  lead,  for  example,  to  development  of  more  specific 
staining  procedures  which  could  vastly  simplify  an  image-processing 
approach  to  cell  analysis.   Other  criteria  might  lend  themselves  to 
non-morphological  techniques  such  as  automated  chemical  analysis  or 
high-speed  spectrophotometry. 
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Increased  DNA  Content 

It  has  been  shown  [185]  that  cervical  cell  samples  from 
women  with  cervical  cancer  typically  have  a  population  of  cells 
with  increased  DNA  content  superimposed  over  the  population  of 
normal  cells.   An  amount  of  DNA  in  a  cell  nucleus  which  corresponds 
to  the  amount  found  in  a  normal  interphase  (between  divisions) 
body  cell  is  called  the  diploid  amount.   The  amount  found  in  a 
replicating  cell  is,  on  the  average,  twice  the  diploid  amount  and  is 
called  the  quadruploid  amount.   If  a  histogram  is  made  showing 
number  of  cells  vs.  DNA  amount ,  then  a  mode  normally  occurs  at  the 
diploid  amount  for  cervical  cell  samples.   If  for  some  reason 
replicating  cells  are  included,  a  mode  will  occur  at  the  quadruploid 
level.   In  samples  of  cancer  cells,  the  histogram  is  spread  out, 
with  large  numbers  of  cells  containing  aneuploid  amounts  of  DNA, 
and  with  modes  occurring  at  places  other  than  at  diploid  and  quadru- 
ploid levels,  indicating  the  presence  of  abnormal  stem  lines  or 
clones  [27,  72,  2k"j],      DNA  has  properties  which  facilitate  deter- 
mination of  its  quantity  photometrically.   It  absorbs  ultraviolet 
light  in  proportion  to  its  mass,  especially  if  the  wavelength  of 
the  light  is  in  the  vicinity  of  260  nanometers.   [267].   Also, 
DNA  binds  basic  dye  molecules.   The  mass  of  bound  dye  can  be 
measured  photometrically,  permitting  the  DNA  mass  to  be  inferred 
[237,  p.  2]. 

Immunological  Indicators 

It  is  a  theory  that  the  transformation  of  a  normal  cell 
into  a  cancer  cell  includes  changing  the  chemical  structure  of 
the  external  cell  membrane  to  the  point  that  the  cancer  cell  has 
antigenic  properties  like  those  of  a  foreign  protein  entering  the 
body  tissues  from  outside.   In  normal  circumstances,  the  antibody 
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response  would  dispose  of  the  tumor  cell.   In  cancer,  however,  the 
immune  system  is  somehow  suppressed,  and  the  tumor  goes  unchallenged. 
[2l8,  p.  595].   Presence  of  tumor-specific  antigen  has  been  demon- 
strated in  hamster  cells  transformed  by  a  virus  into  cancer  cells. 
[218,  p.  6ll],   The  antigen  can  be  detected  by  the  fluorescent 
antibody  technique  in  which  antibodies  are  tagged  with  fluorescent 
dye  molecules.   [25^].   When  the  antibodies  attach  to  the  antigenic 
tumor  cells,  the  fluorescent  dye  permits  identification  by  fluores- 
cence microscopy.   [92].   Conversely,  tumor  cells  in  humans  have 
been  detected  by  measuring  the  absence  of  isoantigens.   Isoantigens 
are  antigens  which  cause  antibody  responses  when  cells  are  trans- 
ferred among  individuals  of  the  same  species,  as  in  the  case  of 
the  A,  B  and  0  blood  types.   Davidsohn,  et.  al.  [38]  claim  to  be 
able  to  demonstrate  a  progressive  loss  of  isoantigen  as  a  pre- 
cursor to  the  spread  of  a  tumor  to  noncontiguous  tissue.   The 
degree  of  loss  of  isoantigen  is  held  to  parallel  the  degree  of 
loss  of  normal  cell  differentiation.   Some  related  characteristics 
of  cancer  cells  are  also  due  to  the  alteration  of  the  normal  chemi- 
cal composition  of  the  cell's  external  membrane.   For  example,  it 
has  been  hypothesized  [218,  p.  591]  that  animal  cells  stop  multi- 
plying when  they  touch  one  another  (contact  inhibition).   Some 
cancer  cells  do  not  have  this  property,  and  continue  to  reproduce 
even  when  piled  up,  as  demonstrated  in  cell  cultures.   Also,  normal 
animal  cells  show  "selective  stickiness"  i.e.,  they  prefer  to  adhere 
to  cells  of  their  own  type.   Again,  cancer  cells  do  not  show  this, 
providing  more  evidence  for  the  existence  of  changes  in  the  cell 
membrane.   This  change  can  also  be  reflected  in  an  altered  electri- 
cal charge  on  the  cell  surface  which  can  be  measured  by  electrophoresis 
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[2^9].   It  was  also  discovered  that  the  surface  changes  result  in 
a  tendency  for  cancer  cells  to  be  precipitated  by  a  certain  glyco- 
protein derived  from  wheat  germ  [2^9;  218,  p.  596].   Unfortunately, 
wheat  germ  agglutinin  also  appears  to  precipitate  normal  parabasal 
and  some  benign  though  atypical  cells.   [2U9]. 
Presence  of  Abnormal  Cell  Products 

Another  theory  supposes  that  cancer  can  cause  changes  in 
the  chromosomes  of  a  cell.   [218,  p.  597].   If  so,  it  would  be 
reasonable  to  expect  to  find  changes  in  the  proteins  coded  by  the 
chromosomes  and  in  the  biochemical  processes  controlled  by  those 
proteins.   These  phenomena  have  been  observed  and  proposed  as 
cancer-cell  characterizers .   One  such  discovery  was  that  many 
types  of  cancer  cells  excrete  more  lactic  acid  than  do  similar 
normal  cells.   This  was  determined  to  be  due  to  an  unexplained 
change  in  the  (unknown)  mechanism  which  regulates  glucose  consump- 
tion.  [218,  p.  59^].   It  is  also  possible  that  production 
of  abnormal  enzymes  can  be  detected  by  biochemical  assay  using 
an  automatic  analyzer  [2U8]  or  by  development  of  specific  staining 
procedures  [250]. 
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III.   REVIEW  OF  CYTOLOGY  AUTOMATION 

A.  Applications 

Research   and  development   in   automating  the  measurement   of  micro- 
scopic  objects  has   at   least   a  quarter-century's   history.      Early  impetus 
to  this  work  was   given  "by   a  desire  to  automate  the   extraction  of  size 
distributions   of  a  variety  of  micron-sized  particles   ranging  from  red 
blood  cells  to  coal  dust,   since   statistically   significant  estimates   of 
these  were   and  are  very  costly  and  difficult  to  obtain  manually    [217]. 
Recently,  much  work  has  been  engendered  by   a  desire  to  develop  instru- 
mentation  for   clinical  use   in  the    cytology   laboratory,    and  for  making 
automated,   quantitative  photometric   studies   of   cell  populations.      A  wide 
range   of  purposes    can  be   discerned,    as   suggested  by  Table   3,    a  somewhat 
arbitrary  sampling  of  work   done   in  the   field  of   cytology  automation. 

In  the  next  sections,  several  cytology  automation  projects  are 
discussed  in  terms  of  the  techniques  used  to  solve  two  problems  which 
seem  to  be  fundamental:  presentation  of  the  cells  to  the  machine  for 
analysis,    and  implementation   of  measuring  and  analysis   strategies. 

B.  Specimen  Transport 

The  method  by  which  cell  samples  are  prepared  and  presented  to 
the  sensor  system  is  critical  in  determining  feasible  approaches  to  the 
analysis  problem.   Three  fundamental  approaches  have  been  tried  in  the 
past:   (l)  microscope  slides,  (2)  fluid  transport,  and 
(3)  linear  deposition. 

1 .   Microscope  Slides 

Microscope  slides  are  the  traditional  method  of  carrying  cells 
for  presentation  to  human  microscopists  and  to  a  variety  of  instruments. 
Initial  attempts  to  develop  high-speed  scanning  devices  used  slide 
transport  [129 ,  127,  115,  87,  199-203,  208].   However,  this  approach 
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Table  3.    Automated  Cytology  Applications 
Some  Representative  Bibliographic  References 


Detection  of  Individual  Red  Blood 
Cell  Abnormalities 

Green  [60] 


Red  Blood  Cell  Count  and  Size 

Distribution  Abnormalities 


Cell  Sorting 


Brecher  [28] 


Neuron  Counting 


Fulwyler  feU  ] 
Hulett  [jS  1 
Kamentsky  [87] 


Microbe  Colony  Monitoring 


Dudley  [19] 
Lipkin  [107] 
Mansberg  [115] 


Detection  of  White  Cell  Abnormalities 
Ingram  and  Preston  [l^+] 


Glaser  [57] 
Mansberg  [112] 


Chromosome  Karyotyping 

Butler  [277] 
Castleman  [276] 
Ledley  [102] 
Mendelsohn  [132] 
Neurath  [278 1 
Rutovitz  [260] 
Wald  [216] 


White  Cell  Differential  Count 

Kamentsky  [155] 

Prewitt  and  Mendelsohn  [l66] 

Preston  [163] 

Technicon  Corporation 

Young  [2U2] 


Live/Dead  Cell  Count 

Kamentsky  [87] 


Exfoliative  Cytology 


Husain/IMANCO  [26l] 
Ishiyama  [76] 
Kamentsky  [9^] 

Nuclear  Research  Associates 
[62] 

Tetronics  Corporation 
[66] 

Tolles  [191] 

Vickers  Corporation  [135  J 

Wied  [227] 


DNA  Distribution 

Van  Dilla  [211] 
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turned  out  to  be  infeasible  because  the  sensing  and  analysis  methods 
of  the  time  could  not  distinguish  between  objects  of  interest  and  arti- 
facts caused  by  clumping  and  overlapping  of  the  cell  sample  while  main- 
taining sufficient  speed.   Slide  transport  has  been  used  successfully 
in  instruments  where  manual  location  and  centering  of  individual  cells 
is  permissible  as  in  microspectrophotometric  cell  constituent  analysis 
[7,  8,  27,  31,  185-187,  226,  228-232,  235,  237,  252],  or  in  research 
environments  where  morphologic  parameters  for  cell  recognition  are  under 
investigation  independent  of  speed  requirements  [ 13-18,  20,  60 ,  107, 
125,  1^7,  l6l,  l66,  l68,  179,  227,  233,  242] ,  or  in  those  cases  where 
the  cell  sample  tends  not  to  present  severe  clump  or  overlap  problems 
[75,  102,  112,  l63,  2l6].   However,  there  is  motivation  to  use  micro- 
scope slide  transport  in  automatic  cell  analysis  systems  because  (l)  it 
greatly  simplified  interfacing  machine  results  with  traditional  morpho- 
logic taxonomies,  especially  in  the  case  of. screening  applications  where 
positive  samples  need  to  be  re-examined  by  a  human.   If  the  sample  is 
disrupted  by  the  transport  technique,  much  valuable  information  about 
cell-to-cell  relationships  is  lost.   If  a  second  sample  must  be  taken 
in  each  case  to  permit  human  interpretation,  intricate  legal  questions 
arise  because  of  the  possibility  of  missed  positives  in  the  second 
specimen.   (2)  In  addition,  a  capability  for  interpreting  slide- 
carried  cells  would  permit  the  use  of  large  numbers  of  existing  labelled 
samples  now  stored  in  archives.   For  these  reasons,  attempts  are  being 
made  to  improve  image-processing  technology  so  that  more  complex  cell 
images  can  be  handled  [123,  l6h9   255,  256]. 
2.   Fluid  Transport 

Many  of  the  problems  in  cell  location,  isolation  and  measurement 
are  greatly  simplified  if  the  cells  can  be  placed  in  a  fluid 
(usually  liquid)  suspension  and  transported  past  the 
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sensors  in  an  orderly,  predictable  manner.   To  implement  this,  the  cell 
suspension  has  been  transported  via  capillary  tubes  or  flow  cells, 
sometimes  with  special  laminar  flow  conditions.   The  latter  was  initially 
described  by  Cros land-Taylor  [37],  and  is  usually  referred  to  as  a 
Sheathflow  or  Cros land-Taylor  system  [lkk9   211].   A  capillary  containing 
the  sample  suspension  is  enclosed  in  a  larger  tube  in  which  a  flow  of 
clean  fluid  is  maintained.   As  the  sample  suspension  emerges  from  its 
capillary,  it  is  surrounded  by  a  sheath  of  clean  fluid.   The  composite 
is  drawn  through  a  constricting  nozzle  which  focuses  the  central  stream 
down  to  a  fine  filament,  perhaps  only  10  urn  in  diameter.   Clogging 
problems  are  reduced,  since  the  constriction  is  done  by  the  fluid  sheath. 
Where  clogging  is  not  an  acute  problem,  or  if  a  close  positional  tolerance 
is  not  required,  then  tubes  or  channels  of  diameter  on  the  order  of 
100  ym  can  be  used. 

Systems  have  been  developed  in  which  the  cell  population  is 
physically  separated  into  fractions  on  the  basis  of  the  measurements 
made.   To  do  this,  the  solute  stream  can  be  broken  into  droplets  by 
periodic  vibration  of  the  fluid  column.   A  charge  is  applied  to  the 
droplets  containing  cells  of  interest,  enabling  them  to  be  electro- 
statically deflected  into  a  separating  container.   This  technique  was 
adapted  by  Fulwyler  [5k]    from  an  ink-writing  oscillograph  of  Sweet  [262]. 
Another  method  of  cell  sorting  is  used  by  Kamentsky  in  his  Rapid  Cell 
Spectrophotometer  [101].   A  fluid  switch  causes  selected  cells  to  be 
pulsed  into  a  side  channel  and  saved  on  a  filter,  an  approach  motivated 
by  a  goal  to  select  the  one-hundred  most  diagnostic  cells  in  a  cervical 
cell  sample  for  viewing  under  a  single  microscope  field  as  a  means  of 
solving  some  of  the  re-screening  problems  in  the  use  of  flow  transport 
systems. 
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3.   Linear  Deposition 

This  approach  is  similar  to  the  fluid  transport  systems  in  that  the 
cell  sample  is  "broken  up  and  suspended  in  a  fluid.   However,  instead  of  carrying 
the  fluid  past  the  transducers,  the  fluid  is  laid  down  in  a  narrow  track  on 
a  plastic  film  or  glass  substrate  using  a  device  similar  to  a  drafting  pen.   The 
track  is  subsequently  carried  past  the  transducers.   This  procedure  has  some  of 
the  advantages  of  both  the  microscope  slide  and  fluid  transport  systems. 
The  cells  are  flattened  and  static  in  shape,  so  conventional  morphologic 
interpretation  can  be  made  relatively  easily  by  machine  or  human.   On 
the  other  hand,  the  object  isolation  problem  is  somewhat  reduced. 
Track-laying  systems  have  been  developed  by  Vickers  [135]  and  Tetronics 
[39,  66]  Corporations  in  Great  Britain  for  cervical  cell  screening 
machines.   Both  of  these  utilize  a  plastic  film  of  about  one  meter  in 
length  for  a  cell  sample,  and  lay  a  line  of  cellular  "ink"  approximately 
1000  (Vickers)  or  500  (Tetronics)  ym  wide.   A  scanner  built  by 
Mansberg  for  fluorescence  measurements  [ll6]  included  a  long  flexible 
glass  ribbon  to  hold  the  specimen. 
h.      Other  Systems 

Mansberg  also  described  a  record-player  like  scanner  in  which  a 
fluorescing  sample  on  a  membrane  was  scanned  in  a  spiral  pattern.   Other 
specially  designed  transport  systems  for  Petri  dishes,  etc.,  have  also  been 
considered  [112] . 
C.   Measurement  and  Analysis 
1.   Morphological  Analysis 

Morphology  is  defined  by  Webster  as  "the  features  comprised  in  the  form 
and  structure  of  an  organism  or  any  of  its  parts."  Since  form  and  structure  are 
most  commonly  sensed  visually,  mechanized  morphological  measurements  are 
typically  made  on  some  sort  of  simulation  of  a  retinal  representation  of  an 
image  of  the  object.   Morphology  can  be  measured  in  other  ways,  such  as  by 
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measuring  diffraction  effects,  but  these   involve   fundamentally  different 
techniques,    and  are   discussed  briefly   in  Section  III.C.2. 

To  form  the   retinal  representation,   light   energy  transmitted  by 
or  reflected  from  the   object   is  measured  either  by  a  retina-like   array 
of  sensors,    or  more   commonly,    a  single   sensor  time-shared  by   scanning 
and  sampling  in   a  systematic  pattern.      This   process   generates    a  picture, 
defined  by  Rosenfeld   [l8l]    as   a  non-negative   function  of  two  variables 
which  is  non-zero  only   in   a  bounded  region   of  some   standard  size   and 
shape,   e.g.    a  square.      Two  fundamental   approaches   have  been  used  in 
making  and  manipulating  picture   representations,  which   can  be  named  the 
one-dimensi on al  and  two-dimensional  approaches.      In  a  one-dimensional 
approach,   the  picture   function   is   processed  by  sampling  in   a  TV-like 
raster  scan.      This   produces   a  time-varying  video  signal  which   can  be 
analyzed  by   digital  or  analogue   signal  processing  techniques   transplanted 
from  well-known  engineering  practice.      A  limited  amount   of  two-dimensional 
information   can  be  processed  by  the  use   of  a  shift    register  or  delay- 
line  buffer  memory  of,   say,   the  previous    few   scan   lines   so  that  measure- 
ments  of  the   same   object   on   succeeding  scan   lines    can  be   associated. 
This    approach  has  been  used  in  many  high-speed  microscope   image   analysis 
systems   so   far  because   fairly  high  processing  rates    can  be   achieved  at 
relatively  low   cost.      Processing  occurs    in  real-time  with  the   scan,  which 
can   occur   at  television  rates.      However,   applications  have  been  limited 
to  those  where   a  rudimentary  analysis  will  work,   e.g.   where  pre-set 
thresholds    can  be  used  to  determine  when   an   object  boundary  has  been 
crossed. 

Two-dimensional  approaches   do  not   constrain   analysis  to  a  parti- 
cular spatial  sequence.      However,   storage   and  accessibility  of  the   image 
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become  problems,  as  does  processing  time,  since  conventional  number  or 

character  computers  have  a  one-dimensional  topology  poorly  adapted  to 

picture  analysis.   Alternative  architectures  have  been  proposed  [82,  106 , 

123,  257,  258],  a^a  a  discussion  of  one  of  these  is  contained  in  the 

next  chapter. 

The  rest  of  this  section  on  morphological  analysis  considers 

various  morphology -based  systems  which  have  been  described  for  use  in 

automatic  cytology.   Discussion  will  frequently  be  in  terms  of  three 

processes:   Object  identification  and  isolation  which  involves  locating 

cells  and  distinguishing  them  from  other  objects  which  might  also  be 

present,  and  also  making  object  sub-segmentations;  Shape  analysis  which 

involves  parameters  of  object  contours,  however  established;  and  Texture 

analysis ,  which  is  based  on  brightness  measurements  independent  (except 

on  a  very  local  level)  of  shape  information.   There  is  some  interaction 

of  these  processes,  e.g.,  texture  analysis  can  be  used  to  establish  a 

contour  for  making  shape  measurements,  or  can  be  used  in  object  location. 

This  breakdown  of  morphological  analysis  into  shape  and  texture  analysis 

is  related  to  a  definition  of  image  by  Huang,  et.al.  [259]:   "We  consider 

an  image  as  the  sum  of  three  components:   the  low- frequency  part,  the 

edges  and  the  textures."  Texture  is  defined,  therefore,  as  what  is  left 

over  when  the  edges  and  low- frequency  parts  are  subtracted  out.   Shape 

information  is  contained  in  the  edge  and  low- frequency  components. 

In  the  Cytoanalyzer  project,  an  attempt  was  made  to  develop  a 

clinical  instrument  for  cervical  cancer  pre-screening  using  scanning 

techniques  transplanted  and  extended  from  particle  counting  and  sizing 

technology  [2k,   25,  188,  199-205].   A  one-dimensional  approach  was  used, 

with  hard-wired  logic  analyzing  a  signal  generated  by  a  Nipkow-disk 

image-plane  scanner.   Early  design  objectives  [218]  were  to  use  Papani- 

colaou-prepared  samples  in  order  that  the  cytologist's  re-examination 
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would  be  easier.   A  preliminary  version  included  object-location 
circuitry  which  would  find  cells  not  overlapped  by  other  cells  by 
examining  the  sequence  of  signal  levels  as  the  scan  crossed  a  cyto- 
plasmic border,  traversed  the  cytoplasm,  then  the  denser  nucleus, 
followed  by  more  cytoplasm  and  finally  dropping  off  the  other  border. 
The  final  implementation,  [2U],  however,  used  a  highly  modified  pre- 
paration, [36],  which  eliminated  cytoplasmic  staining  and  was  intended 
to  disperse  the  cells  evenly  on  the  slide.   The  image  was  scanned  with 
a  two-um  aperture  in  an  extended  raster  of  100  ym  x  5  cm.  Each 
time  the  aperture  traversed  a  chord  of  a  nucleus,  a  video  pulse  resulted. 
The  duration  was  proportional  to  the  width  of  the  chord  and  the  maximum 
amplitude  was  proportional  to  optical  density.   These  chords  were  summed, 
resulting  in  a  measurement  of  nuclear  area  and  integrated  density, 
the  parameters  determined  to  be  discriminating  for  cancer  cells  by  the 
study  discussed  in  II.C.l.   Cell  nucleus  identification  was  accomplished 
by  rejecting  chord  pulses  preceded  or  followed  by  absorbing  material. 
Since  the  cytoplasm  was  unstained,  a  true  cell  nucleus  was  supposed  not 
to  have  this  characteristic.   Some  rudimentary  shape  measurements  were 
used  to  eliminate  non-nuclei.   However,  a  residual  noise  count  of  10-50 
abnormal  counts  was  still  experienced  on  a  normal  smear.   Each  nucleus 
was  classified  and  counted  as  normal,  abnormal,  or  indifferent.   When 
the  normal  count  reached  10,000,  the  machine  stopped  and  the  ratio  of 
abnormal  to  normal  counts  (A/N  ratio)  was  computed.   If  this  exceeded 
a  threshold,  the  sample  was  declared  positive. 

In  clinical  trials  using  aspiration-collected  samples  and  a  proto- 
type Cytoanalyzer ,  a  false  negative  rate  of  10.3  per  cent  was  achieved 
for  a  A/N  ratio  threshold  which  correctly  identified  36.6  per  cent  of 
negatives.   The  false  negative  rate  happens  to  be  close  to  that  estimated 
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as  common  and  acceptable  for  human  screeners  (see  Table  l),  and  the 
overall  performance  was  not  very  much  worse  than  that  predicted  for 
the  vaginal  aspiration  samples  [207].   However,  these  results  caused 
the  Cytoanalyzer  prototype  to  be  deemed  unusable  for  pre-screening  for 
two  reasons  [l9l]:   (l)  It  was  felt  that  the  10.3  per  cent  false  negative 
rate  was  too  high  for  prescreening,  since  the  subsequent  false  negative 
rate  of  the  human  screener  would  increase  the  error  to  an  intolerable 
level.   It  was  felt  that  lowering  the  A/N  ratio  threshold  to  improve 
the  false  negative  rate  would  force  the  false  positive  rate  so  high  that 
the  machine  would  not  reduce  work  loads  to  a  significant  extent. 
(2)  Some  of  the  counts  registered  were  out  of  agreement  with  experi- 
mental data.   An  investigation  led  to  the  conclusion  that  the  presence 
of  large  numbers  of  leukocytes  overwhelmed  both  the  normal  and  abnormal 
count  categories  because  single  leukocytes  could  not  be  distinguished 
from  superficial  epithelial  cell  nuclei  and  clumps  of  leukocytes  could 
not  be  distinguished  from  enlarged,  abnormal  nuclei.   Furthermore,  indi- 
vidual abnormal  cells  were  called  indifferent  or  normal  82%  of  the  time 
because  of  the  placement  of  the  decision  boundaries. 

An  approach  very  similar  to  the  Cytoanalyzer  was  taken  in  the 
"Automatic  Cytoscreener"  described  by  Ishiyama  [ 76 ] ,  although  the 
objective  was  to  develop  a  screening  machine  rather  than  a  pre-screener. 
This  device  used  glass  slide  transport,  a  CRT  flying-spot  light  source, 
and  one-dimensional  image  analysis  with  wired-in  logic.   The  abnormal/ 
normal  decision  for  a  slide  was  made  on  the  basis  of  the  frequency 
distribution  of  nuclear  diameters.   Nucleus  identification  capability 
seems  to  be  absent,  and  no  performance  data  at  all  on  cervical  material 
is  given. 
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In  a  pioneering  study,  Previtt  and  Mendelsohn  [l66]  analyzed 
images  of  leukocytes  using  algorithms  which  have  since  seen  use  by 
several  others  [60,  163,  2^2].   An  off-line  CRT  flying  spot  scanner, 
CYDAC,  was  used  to  generate  two-dimensional  representations  of  cell 
images  which  were  later  analyzed  with  a  general-purpose  digital  computer. 
However,  neither  the  object  isolation  nor  the  texture  analysis  described 
used  spatial  information  directly.   All  processing  used  only  the  infor- 
mation contained  in  the  optical  density  frequency  distribution  (gray 
value  histogram)  of  the  image.   This  seems  to  have  been  possible  at 
least  partly  because  the  images  contained  only  the  cells  of  interest  and 
were  very  low  in  noise  content  in  terms  both  of  random  scanner  noise 
and  interference  from  other  similar  objects.   Object  location  was  con- 
fined to  separating  the  smoothed  histogram  into  three  regions:   back- 
ground, cytoplasm  and  nucleus.   This  separation  was  accomplished  by 
assuming  that  density  value  changes  rapidly  as  the  border  between,  say, 
cytoplasm  and  background  is  crossed.   Therefore,  these  border  density 
values  will  occur  comparatively  infrequently  in  the  picture,  and  will 
show  up  in  the  frequency  histogram  as  minima.   The  local  minimum  of 
lowest  optical  density  value  is  taken  to  represent  the  threshold  dis- 
tinguishing background  and  cytoplasm  picture  elements  and  the  minimum 
of  highest  density  defines  the  nuclear  boundary.   These  thresholds  can 
be  used  to  segment  either  the  digitized  picture  itself  or  the  density 
histogram  derived  from  the  picture.   Rudimentary  shape  and  texture 
features  such  as  cytoplasm  area,  average  nucleus  density,  density  dis- 
tribution skewness  and  several  others  are  extracted  from  the  density 
histogram,  and  classification  occurs  on  the  basis  of  techniques  of 
standard  statistical  decision  theory.   The  initial  study  reported  in 
[166]  only  classified  four  of  the  five  leukocyte  types  (basophil  omitted) 
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and  a  training  set  of  only  22  cells  was  used.   There  was  no  report  of 
trials  using  an  unknown  set.   Further  experiments  were  reported  in  [167]. 

In  contrast  to  the  methodological  and  research  orientation  of 
the  Prewitt-Mendelsohn  project,  the  Perkin-Elmer  Corporation's  Cellscan/ 
GLOPR  system  [7^-75,  l6l-l6U]  was  intended  to  lead  eventually  to  "a 
practical  system  for  general  use"  [163].   The  initial  purpose  [l6l]  was 
to  demonstrate  the  feasibility  of  semi- automatic  ally  scanning  "blood 
smears  to  locate  very  rarely-occurring  binucleate  lymphocytes,  which  are 
evidence  of  low-level  radiation  damage.   The  approach  used  is  also  in 
contrast  to  the  Prewitt-Mendelsohn  procedure  in  that  optical  density 
information  is  eliminated  as  quickly  as  possible  and  processing  is  done 
on  a  binary  picture.   On  the  other  hand,  later  versions  of  Cellscan 
utilize  a  histogram  technique  to  derive  the  binary  pictures,  segmenting 
the  multiple  density  input  image  into  cytoplasm/red  cell  and  leukocyte 
nucleus  picture  elements.   These  planes  are  processed  by  a  special 
purpose  hardware  image  processor,  the  Golay  Logic  Processor  (GLOPR) 
[258,  l6U].   The  two-dimensional  GLOPR  operations  can  extract  features 
for  shape  and  texture  analysis  using  local  homogeneous  operations  very 
similar  in  spirit  if  not  in  implementation  to  PAU  operations  of  the 
Illiac  III  (see  next  chapter).   Texture  analysis  of  the  nucleus  fine 
structure  is  accomplished  by  an  iterative  thinning  algorithm  which 
strips  ones  from  the  borders  of  connected  components  of  the  binary 
pictures.   At  each  iteration,  the  number  of  remaining  isolated  ones  are 
counted.   A  record  of  these  counts  as  a  function  of  iteration  number  is 
a  texture  feature  which  indicates  the  number  and  size  distribution  of 
granules  in  the  nucleus  [75].   Several  shape  measurements  can  also  be 
made  by  using  similar  combinations  of  marking,  propagation,  and  counting 
processes,  for  example  to  measure  area  vs.  perimeter,  or  to  locate  and 
size  concavities  and  inclusions.   A  late  version  of  Cells  can /GLOPR 
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includes  an  automatic  vibrating-mirror  scanner  with  a  hardware  object 
locating  capability.   Apparently,  the  latter  is  done  by  one- dimensional 
wired-in  signal  analysis  using  simple  criteria  for  identifying  objects 
likely  to  be  white  cells.   All  versions  of  Cells  can  have  used  microscope 
slide  transport  and  standard  staining  procedures.   Preparation  has 
included  spinning  the  slide  to  produce  a  monolayer  of  well-dispersed 
cells.   The  system  can  apparently  do  a  differential  leukocyte  count  as 
accurately  as  a  human  scanner  [26U].   However,  two  factors  mitigate 
against  its  acceptance  for  clinical  use:   (l)  As  currently  implemented, 
it  is  much  too  slow  (overnight  to  do  one  slide);  and  (2)  Technicon 
Corporation,  the  IBM  of  clinical  lab  equipment,  recently  introduced 
Hemalog-D,  a  flow  system  with  non-morphological  analysis,  which  can  do 
the  differential  count  at  realistic  rates. 

The  use  of  spectral  (color)  information  has  been  investigated  by 
Young  [2U2J  at  MIT  as  a  means  of  object  isolation  in  color  photomicro- 
graphs of  blood  smears.   The  objective  is  to  classify  each  picture 
element  as  background,  red  cell  or  white  cell.   A  color  transparency 
is  scanned  with  a  broad-spectrum  CRT-generated  flying-spot.   Two  dichroic 
mirrors  separate  the  transmitted  light  into  red,  blue,  and  green  com- 
ponents which  are  each  sensed  by  a  photomultiplier  and  digitized  to 
eight  bits.   The  color  data  is  encoded  as  two  numbers,  r  =  R/(R  +  G  +  B) 
and  g=G/(R+G+B)  where  R,  G,  and  B  are  the  three  digitized  PMT 
outputs.   The  vector  (r,g),  called  a  chromaticity  pair,  is  used  as  a 
feature  vector  for  a  maximum-likelihood  classifier  in  making  the  red- 
cell/white-cell  discrimination.   A  Prewitt-type  histogram  approach  is 
also  used  to  separate  background  points  from  cell  points. 

Husain  [26l]  conducted  a  study  to  determine  the  best  color  filter 
to  use  in  a  Quantimet  monochromatic  image  processor  (see  below)  to 
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establish  a  brightness  threshold  capable  of  reliably  separating  nucleus 
from  cytoplasm  and  malignant  from  non-malignant  nucleus.   His  conclusion 
was  that  "density  alone  cannot  work  satisfactorily"  because  of  an 
"unacceptable  degree  of  overlap  in  some  cases."   Husain  conducted  a 
further  study  which  generally  confirmed  the  earlier  Cytoanalyzer 
parameter  analysis  of  Tolles ,  et.al. 

Green  used  the  histogram  method  to  isolate  and  segment  scanned 
monochromatic  photomicrograph  images  of  red  blood  cells  in  an  attempt 
to  develop  quantitative  morphologic  measures  equivalent  to  hematologists ' 
evaluations  [60].   Green  used  noisy  images  of  much  reduced  resolution 
as  compared  to  Prewitt-Mendelsohn' s ,  and  found  that  noise  and  quantization 
errors  required  a  more  complicated  method  of  locating  peaks  and  valleys 
in  the  histogram.   A  global  histogram  segmentation  was  followed  by  local 
histogramming  and  segmenting  of  the  red  cells.   Object  recognition  was 
by  fairly  simple  area  and  perimeter  vs.  area  criteria.   A  large  number 
of  shape  and  texture  features  were  extracted,  including  area,  total 
optical  density,  and  eccentricity.   The  primary  objective  was  to  extract 
these  quantitative  shape  measurements ,  rather  than  to  arrive  at  a 
diagnostic  decision. 

The  Vickers  trace-laying  system  mentioned  in  Section  III.B.3.  was 
intended  to  be  used  with  a  very  simple  optical  device  for  locating  dark 
blobs  of  size  greater  than  12  microns  in  cervical  cell  samples  [135]. 
A  fair  amount  of  work  was  done  to  show  that  this  would  be  an  adequate 
parameter  for  pre-screening  [175].   However,  this  claim  seems  to  have 
met  with  massive  skepticism  [22U,  169 ] .   A  more  sophisticated  texture 
and  shape  analysis  was  proposed  by  McMaster  [125],  with  the  12-micron 
criterion  to  be  used  for  quickly  locating  objects  of  interest  [251], 
Low  noise,  high  resolution  pictures  digitized  from  photomicrographs  of 
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individual,  isolated  cells  were  used.   Five  parameters  were  extracted: 
average  transmission,  average  transmission  difference  in  adjacent  picture 
elements,  average  difference  in  successive  centroid-nuclear  border  radii, 
average  radius,  and  ratio  of  maximum  to  minimum  radius.   A  product,  P, 
of  weighted  functions  of  these  parameters  was  formed,  with  the  weights 
computed  by  maximizing  the  difference  of  the  extreme  values  of  P  for  a 
small  training  set  of  18  normal  basal  cells  and  11  malignant  cells.   No 
effort  was  made  to  try  the  classifier  on  any  cells  not  in  the  training 
set. 

Similar  measurements  were  automatically  extracted  from  photomicro- 
graphs of  cervical  smears  by  a  system  of  Rosenberg  and  Ledeen  [179]. 
Object  location  and  identification  was  accomplished  by  a  raster  search 
stopped  by  a  preset  threshold  density,  followed  by  size  and  shape  analysis 
Measurements  were  extracted  (average  nuclear  radius,  average  deviation 
of  radii,  nuclear  area,  average  nuclear  density)  and  conditional  proba- 
bility distributions  computed  for  100  normal  and  100  malignant  cells 
(pre-identified) .   Means  and  standard  deviations  agreed  closely  with 
Reagen  and  Wied  (see  Section  II.C.l.).   However,  conditional  distributions 
published  in  [179]  show  considerable  overlap  between  normal  and  malig- 
nant cells,  suggesting  that  the  features  extracted  are  not  very  good 
for  cancer  detection.   This  is  rather  interesting,  considering  that 
these  measurements  supposedly  reflect  criteria  used  by  human  screeners. 
Also,  Rosenberg  and  Ledeen  fail  to  mention  whether  the  cell  samples  were 
from  more  than  one  person.   No  classification  of  unknowns  was  attempted 
using  these  conditional  distributions. 

A  series  of  instruments  for  sizing  and  counting  of  various  micro- 
scopic objects  was  described  by  Mansberg  [ 111-116].   All  used  a  one- 
dimensional  approach,  and  assume  that  objects  are  fairly  predictable  in 
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density,  well  separated  and  not  too  complex  in  shape.   A  prototype 
scanner  was  described  which  was  intended  to  scan  an  entire  sectioned 
human  brain  (6000  whole-section  slides)  in  1200  hours  to  attempt  to 
establish  a  quantitative  correspondence  between  brain  lesioning  and 
neuron  depopulation.   To  do  this,  it  is  necessary  to  distinguish  glial 
cells  from  neurons.   Since  shape  analysis  is  difficult  with  one-dimensional 
processing,  this  object  identification  problem  was  attacked  by  trying 
to  defocus  the  spot  enough  to  eliminate  detection  of  the  smaller  glial 
cells.   However,  problems  were  encountered  in  controlling  the  spot  size 
variation,  and  at  last  notice  (196U),  the  glial  cell/neuron  discrimina- 
tion problem  had  not  been  solved  [113].   Mansberg  also  described  a 
system  for  counting  fluorescing  biological  objects  stained  with  the 
fluorescent  antibody  technique  [ll6]  (see  the  previous  chapter). 

Another  scanner  for  making  fluorescence  measurements  on  slide- 
transported  smears  was  proposed  in  1951  by  Mellors ,  Papanicolaou,  et.al. 
[127-130]  for  automating  cervical  cancer  detection. 

The  Quantimet,  a  vidicon  scanning  instrument  with  one-dimensional 
analysis  was  developed  by  Image  Analysing  Computers,  Ltd.  for  counting 
and  sizing  microscopic  inclusions  in  metallurgical  specimens.   The  same 
device  was  used  for  counting  goblet  cells  in  specially-stained  sections 
of  lungs  of  rats  exposed  to  sulfur  dioxide.   However,  the  machine  counts 
were  not  directly  comparable  to  manual  counts ,  since  the  Quantimet  could 
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not  distinguish  stained  goblet  cells  from  random  blobs  of  stain  [120]. 
A  more  sophisticated  version  of  Quantimet  was  used  in  some  experiments 
to  determine  its  usefulness  in  cervical  cancer  screening  [255,  26l], 
Slide  transport  and  standard  Papanicolaou  staining  was  used,  although 
there  was  a  desire  to  couple  the  Quantimet  to  the  Tetronics  trace-laying 
machine  [26l].   Newer  versions  of  the  Quantimet  include  an  ability  to 
extract  several  morphological  features  such  as  integrated  density, 
perimeter,  or  presence  of  second  phases  (nuclei). 

Both  morphological  and  non-morphological  analysis  occurs  in  the 
Cytoscreener  of  Nuclear  Research  Associates  [62]  in  which  a  programmed 
Ultraviolet  CRT  flying-spot  scanner  generates  a  raster  on  cells  transported 
by  a  laminar  flow  system.   The  system  measures  total  UV  absorption  as 
a  measure  of  DNA  content,  and  also  analyzes  nuclear  size,  nuclear  density, 
cell  symmetry  and  nuclear-cytoplasmic  ratio  using  "a  pattern-recognition 
computing  system"  not  further  described.   The  Cytoscreener  also  has  an 
object-location  capability  in  that  it  can  recognize  and  ignore  fragmented 
or  clumped  cells  or  debris.   All  this  occurs  at  a  rate  of  approximately 
5000  cells  per  minute,  which  is  rather  slow,  since  a  cell  sample  may 
contain  100,000  cells  or  more.   Performance  of  the  Cytoscreener  in  a 
clinical  trial  using  cervical  cell  samples,  was  fairly  impressive.   One 
hundred  specimens  were  used,  of  which  2U  were  known  to  be  from  cancer 
patients,  and  76  were  known  to  be  from  normal  patients.   The  specimens 
were  analyzed  by  conventional  cytology  and  also  by  the  Cytoscreener. 
The  false  negative  rate  was  8.3%  for  the  Cytoscreener  and  12.5%  for  the 
humans.   However,  only  22  of  76  normals  were  screened  out  by  the  machine, 
partly  because  31  of  the  76  were  rejected  as  having  insufficient  cellular 
material  for  processing.   (Compare  with  Rapid  Cell  Spectrophotometer 
performance,  next  section.) 
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George  Wied  and  his  associates  have  conducted  extensive  investi- 
gation into  image  parameters  to  distinguish  malignant  and  normal  cells. 
According  to  Wied  [363],  "there  are  two  roots  to  the  application  of 
pattern  recognition  principles  to  biological  cells.   One  has  its  origin 
in  development  of  methods  and  instruments  for  quantitative  cytochemistry. . . 
The  other  root...  is  found  in  commercial  and  academic  interests  for 
automating  clinical  microscope  screening  procedures." 

Wied  has  developed  a  cell  recognition  system  (TICAS)  very  strongly 
based  in  quantitative  cytochemistry.   Cells  are  located  manually  and 
scanned  with  a  very  slow  scanning  microphotometer  (17-20  msec  per  sample  point 
minimum).   Three  objectives  are  enunciated  [238]:   (l)  Description  and 
discrimination  of  cells  which  are  known  to  be  biologically  different 
but  which  are  difficult  to  distinguish  by  standard  techniques,  (2)  Pro- 
viding computerized  access  to  images  of  cells  accompanied  by  expert 
diagnostic  opinions,  (3)  Computer-aided  instruction  in  cyto-  and  histo- 
pathology.   A  system  of  remote  access  to  TICAS  via  phone  lines  has  been 
proposed  [238]  and  now  implemented,  wherein  an  image  generated  by  a  Zeiss 
Cytoscan  is  sent  to  a  central  PDP-10  for  assessment.   A  large  number  of 
reports  on  investigations  along  these  lines  have  been  published  [226-233, 
238,  13-18,  20,  265].   To  show  that  machine  analysis  could  make  discrimi- 
nations difficult  or  impossible  for  humans  observing  standard  cytologic 
preparations,  Wied  and  his  colleagues  selected  several  difficult  clinical 
problems:   (l)  discrimination  among  uterine  glandular  cancer  cells,  normal 
glandular  cells  and  histiocytes  [226,  229],  (2)  discrimination  among 
uterine  glandular  cancer  cells,  cells  from  a  possibly  precancerous  con- 
dition of  the  uterus  (hyperplasia),  and  normal  uterine  glandular  cells 
[232],  (3)  discrimination  between  similar-appearing  normal  cells  from 
different  parts  of  the  uterus  (endocervix  and  endometrium  [231,  228], 
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(h)    discrimination  between  normal  and  leukemic  white  blood  cells  (lympho- 
cytes) [230,  15].   Also,  studies  were  done  in  discrimination  of  cyto- 
chemically  and  morphologically  identical  tissue  culture  cells  from  human 
embryonic  lung  (HEL)  and  human  epidermoid  tumor  (HEP)  [227,  13,  lU]. 
A  variety  of  methodologies  were  tried.   The  usual  histogram  of  optical 
density  values  received  considerable  massaging,  being  used  both  as  a 
source  of  features  (integrated  or  average  optical  density,  sum  of  five 
highest  density  values  with  non-zero  frequencies,  sum  of  frequencies  of 
specific  density  values)  for  constructing  linear  discriminant  functions, 
and  as  a  means  of  calculating  threshold  values  for  segmenting  the  gray- 
value  image.   In  the  latter  technique,  a  composite  optical  density  histo- 
gram derived  from  the  entire  training  set  population  for  one  cell  type 
is  re-partitioned  into  unequal  bin-widths  so  that  each  bin  has  approxi- 
mately the  same  frequency  of  occurrence,  (see  Section  VIB . ) 

This  is  conceptually  the  same  as  recoding  the  image  using  a  maxi- 
mum-entropy gray-level  quantization,  thresholding,  and  applying  simple 
shape-recognition  procedures  to  selected  binary  isodensity  discriminants. 

Wied  and  his  colleagues  have  experimented  with  a  large  number 
of  features  and  a  wide  variety  of  classification  techniques.   Recent 
development  of  system  programming  to  automote  the  search  for  good  features 
and  classifiers  should  be  a  significant  aid  to  exploration.   However, 
published  results  do  not  appear  to  include  investigation  of  the  crucial 
question  of  performance  of  features  and  classifiers  on  images  which  were 
not  contained  in  the  training  set  used  to  design  features  and  classifiers. 
This  is  typically  the  point  at  which  pattern  recognition  systems  have 
a  tendency  to  fall  apart.   Also,  the  imagery  used  in  Wied's  studies  was 
of  a  particularly  high  quality  in  terms  of  low  noise,  high  resolution 
and  accuracy.   Whether  Wied's  methods  can  be  used  under  less  favorable 
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circumstances  is  questionable,  although  it  should  be  pointed  out  that 
Wied's  stated  objectives  do  not  include  high-speed,  fully  automatic 
cell  sample  analysis. 

2.   Non-Morphological  Analysis 

Several  systems  have  been  developed  using  fluid-flow  cell  trans- 
port.  These  systems  capitalize  on  the  ability  to  control  cell  positions 
in  a  flow  channel  and  typically  make  whole  object  (zero-resolution)  non- 
morphological  measurements. 

The  Rapid  Cell  Spectrophotometer  (RCS)  of  Kamentsky/IBM  [8U-91, 
9k]    is  a  general  purpose  instrument  which  has  been  applied  to  cervical 
cancer  screening  [9*0  s  blood  cell  differential  counting  [155],  cell 
viability  assay  [87,  126]  and  exploratory  studies  to  establish  photometric 
features  which  could  distinguish  populations  of  functionally  different 
cells  [89,  73].   The  RCS  is  capable  of  measuring  500  cells  per  second 
and  has  been  equipped  with  a  fluid-switch  cell  sorter. 

In  the  cervical  cell  application,  the  RCS  was  subjected  to  a 
clinical  trial  in  1965  [9^]«   Performance  for  cervical  swab  specimens 
was  not  spectacular:   15$  false  negatives  with  32$  false  positives,  and 
for  vaginal  wash  specimens  50$  false  negatives  for  25$  false  positives. 
In  addition,  problems  with  sample  preparation  caused  only  *i5$  of  the 
cervical  swab  and  70$  of  the  vaginal  wash  specimens  to  be  usable.   This 
performance  was  apparently  found  to  be  unsatisfactory  for  clinical  use. 

The  Coulter  counter  is  a  highly  successful  fluid  transport  system 
which  has  enjoyed  extensive  clinical  and  research  use  for  establishing 
size  distributions  for  a  large  variety  of  objects  [253].   No  optical 
measurements  are  made;  the  objects  to  be  measured  are  pumped  in  an 
electrically  conductive  fluid  medium  through  a  small  aperture  in  a  non- 
conducting plate.   A  current  is  also  flowing  through  the  aperture. 
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Passage  of  an  object  causes  a  change  in  resistance  through  the  aperture, 
generating  an  electric  voltage  pulse  proportional  to  the  volume  of  the 
object.   This  device  was  used  [97]  in  an  unsuccessful  [68]  experiment 
in  cervical  cancer  screening.   The  Coulter  counter  has  also  been  used 
as  the  front  end  of  other  fluid- flow  systems,  since  the  Coulter  pulse 
can  be  used  to  control  a  variety  of  events. 

An  electronic  cell  sorter  built  by  Fulwyler  [5U]  combines  a 
Coulter  counter  with  an  electrostatic  droplet  deflection  system  like 
the  one  used  in  the  Inktronic  printer,  and  can  sort  1000  cells  per  second 
on  the  basis  of  cell  volume.   A  stream  of  cell-containing  fluid  is 
broken  into  droplets  by  applying  a  vibration  to  the  nozzle  forming  the 
stream.   The  Coulter  volume  signal  is  used  to  apply  a  proportional 
electrical  charge  to  the  droplet  containing  the  cell.   The  droplets 
then  pass  through  an  electrostatic  field  which  deflects  the  cells 
according  to  the  volume-proportional  charge. 

The  Automatic  Multiparameter  Analyzer  for  Cells  (AMAC)  proposed 
by  Leif  [105]  is  also  built  around  a  Coulter-effect  synchronizer;  having, 
however,  a  more  elaborate  sensor  system  with  optical  measurements  available 
at  several  wave  lengths.   A  system  described  by  Hulett,  e.al.  [73]  uti- 
lizes  the  droplet  cell-sorter  of  Fulwyler,  but  substitutes  a  fluorescence 
measurement  for  the  Coulter-effect  volume  signal  used  to  charge  the 
solute  droplet. 

A  Cros land-Taylor  laminar  flow  system  by  Van  Dilla  et.al.  [211] 
uses  a  laser  excitation  source  and  a  multichannel  pulse-height  analyzer 
to  produce  a  frequency  distribution  histogram  of  fluorescent  light 
emission  per  cell  at  a  rate  of  up  to  100,000  cells  per  minute.   This 
instrument  was  used  in  a  study  to  establish  the  time-course  of  DNA 
synthesis  in  a  population  of  mouse  fibroblasts. 


These  permutations  and  combinations  of  sensors,  applications, 
and  particle  transport  methods  are  summarized  in  Table  k. 
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Project/ 
References 


Fulvyler   [5h,   262] 


Van  Dilla   [211] 


Flow  Systems — No n -Morphological  Measurements 

Application  Measurement/ 

Performance 


Sort  by  volume  while 
retaining  cell  via- 
bility to  determine 
relation  of  volume  to 
functional  state 


Measure  fluorescence 
due  to  DNA  content   in 
Chinese  hamster  ovary 
tissue-culture  cells. 
Determine  time   of  DNA 
synthesis   and  duration 
of  cell  phases. 
Improved  statistical 
significance  of  large 
cell  sample  size  over 
studies  using  scanning 
microspectrophotometers 


Coulter  volume- 
Record  data  with  multi' 
channel  pulse-height 
analyzer. 

30,000-60,000  cells/ 
min. 962  viability 


Fluorescent  emission 
(Feulgen)  with  Argonio* 
laser  source  at  U88  nm. 
Record  data  with  Multi- 
channel pulse-height 
analyzer. 

10,000-100,000  cells/ 
min. 


Transport/ 
Sorting  Method 


Electrostatic  deflection 
of  charged  droplet  system 


Crosland-Taylor  laminar- 
flow  system. 
No  sorting. 


Rapid  cell  spectro- 
photometer [8U-91, 
9U,  126,  155] 


Cervical  cancer  screen- 
ing. Live/dead  cell 
assay.  Differential 
white  blood  cell  count. 
Population  studies. 


Scattering  of  white 
light.  Absorption  at 
various  UV  wavelengths 
60,000  cells /min. 


Capillary  tube  transport 
with  fluid  switch  sorting. 


Mullaney  [1U3-IU5] 


Derive  volume  spectra 
of  large  cell  popula- 
tions. Increased 
accuracy  over  Coulter 
counter  for  fixed 
cells. 


Volume  measured  by 
narrow-angle  light 
scattering   (primarily 
diffraction).     Laser 
light  source  at  632.8 
nm.     Data  stored  in  a 
multi-channel  pulse- 
height  analyzer. 
10,000-100,000  cells/ 
rain. 


Crosland-Taylor  laminar- 
flow  transport.     No 
sorting. 


Hulett  [73] 


Separate  mixed  cell 
populations  on  the 
basis  of  fluoro- 
chromasia  developed  by 
enzyme  action  on  FDA. 
Cell  viability  is 
retained. 


Mercury-arc  excitation 
source.     Light  pulse 
from  fluorescent  cell 
activates  charging 
pulse. 


Electrostatic  deflection 
of  charged  droplet   stream 
(see  Fulwyler). 


Technicon  Corp. 
Hemalog-D 


Differential  white 
blood  cell  count 
(Count  white   cells 
classified  in  five 
types) . 


Measures  fluorescence 
and  absorption  at 
various  wavelengths. 


Capillary  tube  transport, 
No  cell  sorting. 
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IV.   EQUIPMENT  AND  PROCESSES  USED 

This  chapter  describes  some  facilities  which  were  used  in  this 
study.   Figure  7  is  a  block  diagram  showing  the  interconnection  of 
some  of  the  hardware  used  to  acquire  and  process  the  images.   Digitized 
images  from  programmable  film  and  microscope  scanners  were  loaded  into 
the  Illiac  III  core  memory  under  control  of  an  interim  software  system, 
Show-and-Tell  [lTl],  which  provides  operator  control  of  image  selection. 
In  addition,  Show-and-Tell  provides  communication  via  a  high-speed  data 
link  with  an  on-line  image  analysis  package,  PAX  II,  running  on  the  IBM 
360/75.   PAX  II  is  a  parallel-image-processing  language  developed  by 
the  University  of  Maryland  from  the  original  simulator  of  the  Illiac  III 
Pattern  Articulation  Unit  [80].   In  the  experiments  in  texture  analysis, 
texture  samples  obtained  interactively  using  Show-and-Tell  were  fed  into 
programs  developed  by  Jayaramamurthy ,  Tareski  and  Raulefs  to  implement 
the  vari-valued  logic  approach  to  pattern  recognition. 

A.   Programmable  Scanners 

A  key  aspect  of  this  study  is  the  use  of  a  flexible  programmable 
scanner  in  which  the  parameters  of  the  image  acquired  are  under  computer 
control.   The  scanners  used  are  the  Illiac  Ill's  film  scanner  and  micro- 
scope scanner.   Both  are  of  the  flying-spot  type  in  which  the 
light  source  is  a  small  (ca.  1  mil)  spot  of  light  generated  by  a  Cathode 
Ray  Tube.   In  the  microscope  scanner,  light  from  the  spot  is  projected 
down  through  the  microscope  ocular  and  is  focused  by  the  objective  to  a 
smaller  spot  on  the  specimen.   A  Photomultiplier  tube  (PMT)  detects  the 
light  transmitted  by  the  specimen.   Another  PMT  independently  examines  the 
CRT  spot  brightness.   These  two  PMT  outputs  are  processed  by  analog 
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techniques  to  yield  a  signal  whose  amplitude  is  not  significantly 
influenced  by  variations  in  the  intensities  of  the  CRT  spot.   This  signal 
is  digitized,  currently  to  four  bits,  and  the  resulting  numbers  are 
loaded  into  Illiac  III  core  at  a  rate  of  approximately  500,000  per  second. 
Ey  deflecting  the  spot  in  a  regular,  TV-like  raster,  a  digital  repre- 
sentation of  the  image  is  formed  in  the  Illiac  III  memory.   Following  the 
notation  of  Rosenfeld  [l8l],  this  representation  will  be  indicated  by 
(a.  .),  1  <  i  <  m,  1  <_  $   <  n,  where  m  and  n  are  the  horizontal  and  vertical 
dimensions  (in  picture  elements)   of  the  conceptual  array  storing  the 
picture.   The  resolution,  proportions  and  placement  of  the  sampling  raster 
are  under  computer  control.   Up  to  a  256  x  256  element  image  representa- 
tion can  be  contained  in  the  memory  as  currently  configured. 

The  film  scanner  works  in  a  similar  fashion,  except  that  the 
raster  and  spot  size  conform  to  the  format  of  a  U6-mm.  film  frame. 

A  monitor  system  permits  viewing  the  digitized  images,  either  in 
real-time  with  the  scanner,  or  by  retrieving  the  picture  element  values 
from  memory.   The  adjustment  of  display  parameters  (magnification,  loca- 
tion on  the  screen)  to  improve  the  interpretability  of  the  monitor  display 
is  also  under  computer  control.   Since  the  specimen  or  film  is  available 
on-line,  it  is  not  necessary  to  read  in  the  entire  image  at  one  time. 
Instead,  the  image  source  can  be  treated  as  a  read-only  memory  of  very 
large  capacity,  and  accessed  only  as  needed.   The  flexibility  of  the 
scanner  permits  a  quick  coarse-scan  look  at  large  areas  of  the  specimen 
followed  by  high-resolution  interpretations  of  areas  of  interest. 

The  microscope  stage  motion  and  focus  are  also  under  control  of 
the  PDP8e/Show-and-Tell,  via  digital  stepping  motors. 
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B.   Show-and-Tell 

Show-and-Tell  is  an  interactive  programming  system  designed  to 
permit  on-line  development  and  testing  of  scanning,  preprocessing  and 
feature  extraction  programs.   The  early  version  used  in  these  experiments 
was  intended  to  provide  control  of  image  acquisition,  display,  and  real- 
time communication  with  an  IBM  360,  which  would  then  provide  image 
analysis  via  PAX  II  (see  next  section),   [ill] 

In  a 'typical  development  session,  a  programmer  codes  the  pro- 
cessing program  as  an  IBM  360  FORTRAN  subroutine,  with  calls  to  PAX  II 
as  required.   The  subroutine  can  include  calls  to  Show-and-Tell  to  type 
messages  to  the  programmer,  read  data  typed  by  the  programmer,  display 
or  scan  images,  and  transfer  pictures  to  and  from  the  360.   This  pro- 
gram is  submitted  through  the  OS/360  batch  system.   When  it  begins  exe- 
cution, the  programmer  is  informed,  and  he  may  begin  testing  his  program 
by  executing  a  Show-and-Tell  CALL  statement.   Intermediate  results  can 
be  displayed  on  the  monitor,  and  various  parameter  values  tested.   Other 
commands  can  cause  images  to  be  saved  on  360  tape  or  disk  for  later  use 
as  data  for  off-line  testing. 


C.   Parallel  Image  Processing  and  PAX  II 

Over  the  past  twenty  years  or  so,  a  paradigm  has  evolved  for 
reducing  image  data:   a  series  of  local  operations  converts  the  input 
image  into  an  image  or  set  of  images  from  which  the  desired  data  can  be 
relatively  easily  obtained.   This  approach  apparently  originated  with 
Self ridge  and  Dineen  [272,  273]  and  has  been  followed  by  many  others, 
e.g.  [258,  257,  270].   For  a  digital  input  picture,  (a..),  a  local 
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operation  generates  a  new  picture  (b.  .),  where  the  value  of  each  output 
picture  element,  b . . ,  depends  on  the  picture  elements  in  some  relatively 
small  neighborhood  of  each  a. ..   Such  an  operation  can  be  performed 
simultaneously  for  all  the  elements  in  a  picture,  since  the  output  at 
each  point  depends  only  on  the  original  values  of  the  neighbors.   This 
definition  follows  that  of  Rosenfeld,  who  has  investigated  the 
alternative  case  where  each  new  value  depends  also  on  the  new  values  of 
some  of  the  neighbors  [2U7].   In  this  case  the  local  operation  must  be 
implemented  serially.   Rosenfeld  showed  that  parallel  local  operations 
can  in  principle  do  anything  serial  local  operations  can  do  and  vice 
versa,  although  with  efficiency  tradeoffs.    If  the  local  operations  are 
homogeneous .  i.e.  the  same  function  rule  is  applied  at  each  location, 
then  it  becomes  economically  feasible  to  build  a  hardware  processing 
array  with  a  shared  control  to  efficiently  implement  the  operations. 
The  hardware  array  processor  can  be  expected  to  perform  parallel  local 
operations  with  a  throughput  improvement  over  conventional  organization 
on  the  order  of  pq.:l,  where  p  and  q  are  the  dimensions  of  the  array. 
This  was  in  fact  realized  in  the  Pattern  Articulation  Unit  of  the  Illiac 
III,  which  consists  of  1,02*1  identical  processors  (stalactites)  arranged 
in  a  32  x  32  array  [123].   Each  processor  can  communicate  directly  with 
as  many  as  eight  (rectangular  topology)  or  six  (hexagonal  topology) 
nearest  neighbors.   Local  operations  involving  more  distant  neighbors  can 
be  performed  by  shifting.   The  fundamental  instruction  set,  discussed 
in  detail  in  the  Illiac  III  system  manuals  [275] s  includes  four  basic 
classes  of  instructions:   (l)  forming  logical  functions  of  the  contents 
of  each  stalactite  and  nearest  neighbors  (including  simple  arithmetic 
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functions);  (2)  shifting;  (3)  propagation;  and  (k)    image  loading,  marking 
and  readout  of  results. 

These  same  instructions  were  simulated  on  a  conventional  computer 
(an  IBM  7090)  in  the  original  PAX  picture  processing  software  system. 
By  treating  the  accumulator  as  one  planar  row  of  the  PAU,  a  modicum  of 
parallelism  was  achieved.   Workers  at  the  University  of  Maryland  extended 
and  improved  the  PAX  system,  and  embedded  it  in  FORTRAN  IV  [80].   The 
resulting  PAX  II  system  therefore  combines  the  numerical  processing  con- 
venience of  FORTRAN  with  a  set  of  FORTRAN-callable  subroutines  to  imple- 
ment the  parallel  instructions.   A  data  management  system  for  storing 
and  accessing  the  picture" arrays  is  also  provided. 
D.   Texture  Recognition  using  Varivalued  Logic 

Varivalued  logic,  introduced  by  McCormick  and  Michalski  [136,  137] 
is  an  extension  and  generalization  of  the  binary-valued  logic  applied  to 
switching  theory.   In  this  thesis,  algorithms  developed  to  support  the 
theory  of  varivalued  logic  were  used  to  automatically  generate  local  opera- 
tions capable  of  discriminating  between  two  textures  of  cytological 
interest.   As  mentioned  above,  a  local  neighborhood  of  an  element  in  a 
digital  picture  is  some  subset  of  the  nearby  picture  elements.   For  example, 
in  a  rectangular  sampling  array  each  picture  element  and  its  nearest  neigh- 
bors to  the  east,  west,  south,  southeast,  and  southwest  form  a  3  x  2  local 
neighborhood.   An  m  x  n  local  neighborhood  can  be  represented  as  an  mn- 
dimensional  vector,  e.g., 


(xi'  X2'  ••*'  Xmn) 
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and  can  be  regarded  as  an  event  in  an  mn-dimensional  sample  space.   If 
the  digital  picture  was  quantized  to  h  gray  levels,  then  the  sample  space 
contains  h   distinct  events. 

In  the  procedure  described  below,  a  training  set  of  digitized 
samples  of  both  textures  is  obtained.   Using  methods  of  statistical  decision 
theory,  each  different  event  (local  neighborhood)  occurring  in  the  train- 
ing set  is  assigned  to  one  or  the  other  of  the  textures.   Events  in  the 
sample  space  that  did  not  occur  in  the  training  set  are  assigned  to  a 
DON'T  CARE  class.   Having  done  this,  one  of  the  textures  is  regarded  as  a 
"true"  set  and  an  analogy  can  be  drawn  between  the  events  assigned  to  the 
true  class  and  the  minterms  of  the  disjunctive  normal  form  of  a  switching 
function.   This  analogy  is  pursued  and  a  suitably  modified  minimization 
procedure  discovers  an  entity  called  an  "interval  cover,"  that  defines  a 
simplified  categorizer  preserving  the  assignment  of  events  to  texture 
classes  much  as  a  prime  implicant  cover  defines  a  simplified  switching 
circuit  preserving  some  desired  truth  table. 

The  details  of  the  procedure  are  presented  by  means  of  a  simple 
one-dimensional  "texture"  example.   The  purpose  is  to  communicate  a  general 
understanding  of  the  concepts  used  in  this  thesis;  a  precise,  formal  exposi- 
tion by  the  originators  of  the  theory  is  contained  in  [136]  and  [137]. 

(l)   Defining  the  categorizer  by  means  of  signal  detection  theory: 
The  purpose  of  this  phase  of  the  procedure  is  to  assign  each  event  in  the 
training  set  to  one  or  the  other  of  the  texture  classes  so  that  an  unambig- 
uous "true"  set  is  defined  for  input  to  the  interval  covering  process. 
Statistical  decision  theory  provides  a  systematic  way  of  doing  this  so  that 
certain  objectives  are  fulfilled  optimally. 
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Let  E  and  E  be  the  sets  of  different  events  obtained  from  the 

texture  samples  T  and  T  ,  respectively,  and  let 

n  (e.  )   Number  of  occurrences  of  the  event  e,  in  T   (the 
Ik  K 

number  of  "hits"). 

n»(e1  )   Number  of  occurrences  of  the  event  e,  in  T   (the 
Ok  K- 

number  of  "false  alarms"). 

nT    -   Number  of  events  in.T  . 

nT       Number  of  events  in  T  . 

In  the  one- dimensional  example  of  Figure  8(a),  the  events  are  1x3 
local  neighborhoods;  the  "textures"  have  been  quantized  to  four  levels,  so 
the  three-dimensional  sample  space  contains  6U  possible  events.   The  nine 
events  of  E  and  the  seven  events  of  E  are  listed  in  the  first  columns  of 
Table  5. 

This  information  can  be  used  to  effect  a  disjoint  partition  of 
E  U  E  that  is  optimal  in  the  sense  that  certain  decision  objectives  are 
satisfied  as  well  as  they  can  be,  given  the  inherent  separability  or  non- 
separability  of  the  sample  data.   For  example,  it  may  be  assumed  that  all 
misclassifications  are  equally  costly  and  that  one  simply  wants  to  minimize 
the  number  of  errors.   It  can  be  easily  demonstrated  [59]  that  the  sum  of 
the  probabilities  of  the  two  types  of  errors  (saying  that  an  event  e  is  from 
T  when  it  is  really  from  T   [a  false  alarm]  and  vice  versa  [a  miss])  is 
minimized  when  an  event  is  categorized  in  T  if  its  likelihood  ratio  '(LR) 
is  greater  than  1. 

The  likelihood  ratio  of  an  event  e,  is  defined  as 

k 
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Figure   8(a).      One- dimensional  "Textures," 
Quantized  to  Four  Gray  Levels 


Table  5.      Statistics   Derived  from  the  One -dimensional  Texture 
of  Figure  3(a)   using  a  1x3  Template  to  Define  Events 


Event   (e) 


Jb±l 


P(e  Tx) 


»0<«> 


P<«  V 


UUe) 


L 


1.2,3 
2,3,2 

2,1,2 

1,0,2 

0,2,3 

3,2,0 

2,0,2 


3,2,1 
2,1,0 
1.3,2 
1,0,1 
0,1,2 
1,2,1 
0.1,3 


.15 
.25 

.10 

.05 

.10 

.05 

.05 


.20 
.05 

0 

0 

0 

0 

0 


0 
0 

0 

0 

0 

0 

0 

.15 
.20 
.15 
.20 
.10 
.10 
.10 


1.33 
.25 
0 
0 
0 
0 
0 


LR(e.  )  = 


5^ 

P(ejT1) 


where 


k    P(ek|T°) 


P(e  |T  )   The  probability  of  e  occurring  in  T  ,  i.e.,  the  proba- 

bility  of  e   conditional  on  T  , 

=  n1(eR)/nT1. 
P(e  |T  )  is  defined  similarly. 


It  can  be  seen  by  looking  at  Table  5  that  this  decision  goal  is 
realized  in  the  one-dimensional  example.   If  only  those  events  with  LR  >  1 
are  called  T  ,  then  four  mis  classifications  (three  false  alarms  and  one 
miss)  result.   If  any  other  subset  is  classed  T  ,  more  errors  occur.   Other 
decision  goals  can  be  realized  by  using  a  different  likelihood  ration  thres- 
hold [59].   For  generality  the  threshold  can  just  be  designated  £. 

With  this  background,  we  are  ready  to  partially  define  the  cate- 
gorizer  ¥„  on  the  basis  of  the  training  set  information. 


Let 


E  =  Event   space. 
F       ={e|eeE^JE     andLR(e)>6}. 
F°6  =   {eleeE^E0   and  LR(e)<£} . 
F*     =    {eleeEME-W3)}. 

Then  define   ¥R  by  its   acceptance   set  R,    i.e., 

\(e)   =   1,  iff  e  R 
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where 


and 


F1  ^  R  <=.  F1  e  U  F* 


RnF0e=  *. 


Note  that  the  determination  of  which  events  in  F*  are  in  R  has  not  been 
made  at  this  point;  these  represent  DON'T  CARE  events  that  are  assigned 
as  described  in  (2)  below. 

The  receiver-operating- char act eristic  (ROC)  curve  is  a  useful  device 
for  observing  and  predicting  the  behavior  of  these  categorizers.   To  make 
the  curve,  each  event  eeE~TJE  is  regarded  as  a  two-component  vector  with 
x  =  p(e|  T  )  and  y  =  p(e|  T  ).   An  ordering  can  be  imposed  on  these  vectors 
by  sorting  them  in  descending  order  by  the  likelihood  ratios  of  the  e. 
The  curve  is  generated  by  placing  the  tail  of  the  first  vector  at  the 
origin  and  then  concatenating  the  rest  in  order.   For  the  one-dimensional 
example,  the  graph  shown  in  Figure  8(b)  is  the  result.   The  ROC  displays 
several  useful  items  of  information  in  an  easy-to-see  form,   For  one  thing,. 

the  training-set  performance  of  a  categorizer  for  each  value  of  8  is  shown 
directly,  since  for  each  threshold  the  y  coordinate  is  equal  to 


fc      p(e|T1) 
{e|LR(e)>8} 


and  the  x  coordinate  is  equal  to 


p(e  T°). 
{e|LR(e)>6} 
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Figure  8(b).   Receiver  Operating  Characteristic 
Corresponding  to  Table  5 


Figure  8(c).   Generalized  Logic  Diagram  with  Interval  Covering 

,  0  233  130223  2323 

of  Tx    against   T    .      Li   =     X2   X3;    L2   =     Xx      X2      X3 ;    L3   =      Xx      X2 
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The  point  on  the  ROC  corresponding  to  a  given  value  of  6  is  easy  to  find 
since  it  is  the  tail  of  the  vector  with  slope  =  g.   (Note  that  g  only  has 
a  finite  number  of  values  with  different  performance  effects.)   The  ROC 
also  provides  a  measure  of  the  inherent  separability  of  the  textures  in  the 
training  set.   The  area  under  the  curve  is  equal  to  0.5  if  the  textures  are 
nondistinguishable  (all  events  occur  with  equal  probability  in  both  tex- 
tures) and  in  equal  to  1.0  if  the  textures  are  perfectly  distinguishable  (all 
events  occur  in  one  or  the  other  texture  but  not  both). 

(2)   Implementing  the  categorizer  by  means  of  varivalued  logic; 
In  principle  the  local  categorizer  described  in  the  preceding  section  could 
be  implemented  by  just  looking  up  input  events  in  a  table  of  events  and 
likelihood  ratios.   However,  for  real  textures  and  useful  neighborhood 
sizes  this  process  would  be  hopelessly  slow.   Also,  no  categorization  would 
be  performed  for  events  not  in  the  training  set.   By  applying  some  concepts 
from  switching- theory ,  equivalent  but  much  more  efficient  categorizers  can 
be  generated.   This  is  accomplished  by  a  technique  analogous  to  switching- 
theoretic  procedures  for  minimization  of  the  disjunctive  normal  form  of  a 

switching  function. 

If  Table  5  is  viewed  as  a  truth  table  where  events  in  F  '  are  true 
and  the  others  false,  then  the  disjunctive  normal  form  can  be  expressed 
as  V£.(e.)  where  £.  is  a  predicate  that  has  output  true  when  the  input 
is  a  particular  event  e.  from  F13,  and  output  false  otherwise.   The  symbol 
V  represents  the  logical  OR  of  the  predicates.   McCormick  and  Michalski 
have  developed  a  generalization  of  switching  theory  [137]  that  permits  the 
transplantation  of  much  of  the  minimization  machinery  already  in  existence. 
In  particular,  Michalski' s  AQ  algorithm  for  generation  of  quasi-minimal 
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covers  can  be  used.   To  explain  the  method  it  is  necessary  to  introduce 
a  few  items  of  notation  from  [137 J # 

E  is  the  event  space  as  before,  i.e.,  the  set  of  all  events 


=  (xv   x2,  ...  xn),   0<xi<h-l. 


a.  b . 
A  literal,   XX  X  is  the  set  of  all  events  eeE  whose  i-th  component 

lies  between  a.  and  b. . 
1      l 


a.  b 


XX   X  =  {(x,  ,  x.  ,  .  .  .  ,  x J  |  "a,  ±  X,  4  b  } 


An  interval  is  a  set-theoretic  product  of  literals 


a.  b . 

L  =  H   xx   X      I  ?r  {1,  2,  ...,  n}. 
lei 


The  interval  represents  a  "box"  in  hyperspace  that  includes  all  events 

between  (a  ,  a  ,  ...,  a  )  and  (b  ,  b  ,  ...,  b  ).   Note  that  components  not 

specified  by  the  interval  are  free  to  take  on  any  integer  value  in  [0,h-l], 

An  interval  cover  of  the  set  F  ^  against  F   is  defined  as  a  union 

of  intervals  L.  such  that 
J 

Flg^.U  L.  ^  F16  U  F*. 
J 

1  R 
Thus  an  interval  cover  contains  all  the  events  in  F   plus  some  in  F*, 

no 

but  none  in  F   .   However,   the  interval  cover  will  represent  this  parti- 
tioning of  the  space  of  possible  events  much  more  concisely  than  just 
enumerating  all  the  events  that  should  be  labeled  T  .   Also,  the  interval 
cover  can  classify  events  that  were  not  in  the  training  set  because  of 
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the  inclusion  of  F*  events  in  the  "boxes.  A  quasiminimal  cover  can  be 

Q 
generated  via  the  A  algorithm  which  can  only  be  sketched  here. 

Referring  again  to  our  usual  example,  an  interval  cover  (a  minimal 

one,  as  it  turns  out)  can  be  generated  manually  by  means  of  a  visual  aid, 

the  generalized  logic  diagram  (GLD),  which  was  introduced  by  Michalski 

[136],  Figure  8(c).   In  the  particular  case  when  variables  assume  only 

two  values,  the  GLD  reduces  to  a  diagram  that  resembles  a  Mar quand-Ve itch 

diagram.   The  GLD  is  a  representation  of  the  entire  event  space,  E:6U 

events  in  this  case.   To  use  it,  the  events  of  F   are  mapped  in  as  ones 

OR 
and  those  of  F   as  zeros.   The  squares  left  over  represent  F*  (DON'T 

CARES).   The  cover  is  found  by  an  iterative  procedure  that  begins  by 
picking  the  first  "one"  encountered  in  a  TV-like  scan  of  the  GLD  and 
discovering  all  of  the  maximal  intervals  that  include  that  one,  but  no 
zeros  (an  interval  "star").   One  of  these,  the  interval  including  the 
most  ones,  is  added  to  the  covering  set  (initially  empty).   All  of  the 
ones  included  in  the  star  are  temporarily  eliminated  and  the  scan  of  the 
GLD  is  resumed.   The  first  one  encountered  is  again  selected  and  the 
iteration  repeats.   Eventually  all  the  ones  have  been  eliminated.   If  all 
ones  are  included  in  the  covering  set,  then  the  cover  is  minimal.   Other- 
wise, the  cover  is  patched  up  to  include  the  neglected  events,  and  may  not 
be  minimal. 

This  procedure  was  followed  for  the  example  and  a  minimal  covering 
using  three  intervals  resulted  (Figure  8  (c)). 
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If  these  are  used  to  form  a  categorizer  ¥  ,  where  *R(e)  =  1  for  eeR  = 
L  U  Lp  U  L_,  then  the  event  categorization  shown  in  Figure  8  (a)  results. 
The  asterisks  appearing  above  Texture  1  and  Texture  0  indicate  events 
for  which  ¥_  had  output  equal  to  1.   The  subscripts  on  the  asterisks 
denote  the  interval  producing  the  hit.   Notice  that  the  probability  of 
a  hit  (19/20  =  0.95)  and  the  probability  of  a  false  alarm  (3/20  =  0.15) 
is  as  predicted  by  the  ROC  for  the  likelihood  ratio  decision  rule  with 
6=1.   These  textures  could  easily  be  discriminated  by  labeling  regions 
with  hit  density  over  some  averaging  aperture  greater  than,  say,  55  percent 
as  Texture  1.   In  a  digital  parallel  processor,  like  the  Illiac  Ill's 
pattern  articulation  unit,  ¥  can  be  implemented  as  a  local  operation.   The 
input  to  the  operation  is  a  digitized  picture  in  several  gray  values  and 
the  output  is  a  binary  plane  labeling  each  element  in  the  input  picture 
as  to  which  texture  the  picture  element  most  likely  belongs.   The  applica- 
tion of  simple  smoothing  or  noise-removal  algorithms  would  then  make  seg- 
mentation into  texture  regions  relatively  easy. 


6l 


V.   CYTOLOGICAL  IMAGE  PROCESSING  EXPERIMENTS 

A.   Blob  Detection 

One  cell  component  which  must  always  be  present  in  a  cell  is 
the  nucleus.   An  object  on  the  slide  is  not  a  cell  unless  it  has  a  nucleus ,  or 
at  least  if  it  has  no  nucleus  it  is  of  no  interest  for  present  purposes.   In 
stained  preparations,  the  nucleus  is  also  the  most  invariant  of  the 
cell  components  in  the  sense  that  it  is  almost  always  a  fairly  symmetri- 
cal, round  object  which  is  darker  than  its  surroundings.   It  makes  sense, 
then,  to  begin  the  smear  analysis  process  by  testing  the  image  for  the 
presence  of  these  round  dark  objects,  which  are  referred  to  below  as 
blobs.   As  has  been  discussed  in  Chapter  III.C.l,  it  is  necessary  that 
blobs  caused  by  clumps  of  leukocytes  be  masked  off  in  some  way.   In 
addition,  the  blob  finder  should  operate  on  images  of  low  resolution 
which  contain  some  noise. 

A  blob  detector  with  these  characteristics  was  implemented  in 
the  form  of  the  PAX  II  subroutine,  'BLOB'.   A  listing  of  BLOB  (and  the 
other  subroutines  mentioned  in  this  thesis)  is  provided  in  the  Appendix 
for  readers  with  knowledge  of  PAX  II.   BLOB  is  essentially  a  predicate 
which  has  output  TRUE  wherever  a  region  satisfying  the  parameters  is 
sensed.   BLOB  functions  by  examining  the  neighbors  of  each  picture 
element,  a. .,  looking  for  a  decrease  in  gray  value  occurring  at  a  distance 
falling  in  the  range  [ IRADLB , IRADUB ] ,  where  IRADUB  and  IRADLB  are  parameters 
passed  to  the  subroutine.   BLOB  requires  that  all  neighbors  at  a  distance  less 
than  IRADLB  be  no  more  than  IDROP  less  than  a   in  gray  value,  where  IDROP  is 
another  parameter.   On  the  other  hand,  at  least  one  neighbor  which  falls  in_  the 
distance  range  [IRDLB, IRADUB]  along  each  principal  direction  must  be  more 
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than  IDROP  less  than  a. ..   Thus,  BLOB  looks  for  points  falling  on  local 
hillocks  in  the  gray-value  topography  of  the  image  by  sending  a  probe 
in  each  of  eight  directions  from  each  a.  ,  and  requiring  that  the  gray 
value  drop  off  before  distance  IRADUB,  but  that  the  gray  value  not  drop 

off  before  distance  IRADLB .  Further,  a  "hit-count"  is  maintained  for  each  a.  . 
of  the  number  of  such  radii  which  found  drop-offs  as  specified.   If  this 
number  exceeds  INTOL  (another  parameter),  then  the  output  plane  is  set  to 
1  at  location  (i,j).   Figure  10(a)  shows  the  operation  of  the  intermediate 
filter  which  transforms  the  input  gray-scale  picture  into  a  "hit-count" 
picture.   The  PAU  architecture  permits  this  algorithm  to  be  executed 
simultaneously  for  all  of  the  elements  of  the  picture.   The  approach  is 
similar  in  spirit  to  that  used  in  some  character  recognition  devices,  such 
as  the  IBM  1975  [67]  in  which  a  loose-fitting  template  samples  high- 
information  parts  of  the  pattern  so  that  minor  variations  in  type  font  and 
print  quality  are  accommodated. 

The  subroutine  BLOB  is  used  in  various  ways  in  attacking  the  Pap 

smear  problem.   First,  BLOB  is  applied  to  the  picture  with  parameters 

such  that  leukocyte-sized  blobs  are  detected.  BLOB  is  capable  of  identi- 
fying individual  leukocytes  in  images  of  fairly  cogrse  resolution,  even 
when  they  lie  close  to  each  other.   Each  picture  element  in  the  images  used 
covers  an  area  nominally  1.U2  x  1.95  ym;  the  entire  image  covers  a 
rectangle  approximately  182  x  250  urn  on  the  specimen.   The  BLOB  subroutine 
output  plane  (128  x  128  x  1  bit)  contains  a  bit  cm  for  each  picture  element 
found  to  represent  a  small  blob.   After  the  small  blobs  have  been  located, 
BLOB  is  iteratively  reapplied  to  the  same  input  picture,  this  time  with 
parameters  defining  blobs  in  the  size  ranges  corresponding  to 
various  types  of  atypical  cell  nuclei . 
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This  second  group  of  output  planes  is  logically  ORed  together,  forming 
a  composite  "big  blob"  location  plane.   Using  a  propagation  process  which 
can  be  implemented  by  the  PAU  operation  CONNECT,  the  small  blobs  are 
"smeared",  so  that  each  on_  bit  is  replaced  by  a  small  cluster  of  on  bits 
[150,  270],   The  smeared  plane  is  inverted  and  ANDed  with  the  big  blob 
plane,  thereby  wiping  out  big  blob  locations  near  small  blobs.   In  a  big 
blob  caused  by  a  cluster  of  little  blobs,  it  is  assumed  that  at  least 
one  little  blob  location  will  be  within  smear  range  of  the  center  of  the 
big  blob. 

Given  the  BLOB  subroutine  described  above,  the  primary  problem 
is  to  determine  the  values  which  should  be  assigned  to  the  parameters 
IRADUB,  IRADLB,  INTOL,  IDROP  (for  big  and  small  blobs)  and  the  smear 
distance.   In  the  big  blob  case,  this  was  accomplished  by  making  use  of 
Table  2,  which  was  derived  from  quantitative  studies  by  Wied  and  Reagen 
discussed  in  Chapter  II.C.l.   Table  2  shows  mean  nuclear  area  for  five 
classes  of  cancer-related  conditions.   If  the  nuclei  are  assumed  to  be 
nearly  round,  the  area  measurements  are  easily  converted  to  approximate 
radius  measurements  by  means  of  the  relation  RADIUS=SQRT( AREA/3.1) . 
These  can  in  turn  be  easily  converted  to  digital  raster  distance,  where 
distance  between  two  picture  elements  is  defined  as  the  number  of  unit  shifts 
required  to  carry  one  picture  element  onto  another.   A  unit  shift  carries 
an  element  onto  one  of  its  eight  (rectangular  topology)  nearest  neigh- 
bors.  Clearly,  shifts  in  a  diagonal  direction  cover  more  ground  on  the 
original  image  than  do  shifts  parallel  with  the  X  or  Y  axes.   In  addition, 
the  scanner  system  used  here  generates  a  raster  with  an  aspect  ratio 
nominally  8.5:11,  so  that  a  shift  in  the  direction  of  the  scan  (Y)  axis 
covers  more  ground  on  the  original  image  than  do  shifts  along  the  line 
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(X)    axis.      Measurements  on  scanned  photomicrographs   showed  that  the  aspect 
ratio  was  actually  closer  to  8.0:11,   so  that  picture  element  centers  on 
a  128  x  128  scan  of  a  102  x  photomicrograph  were  1.95  Vim  distant   from 
each  other    (on  the   specimen)   in  the  Y-direction  and  1.1+2  ym  distant   in 
the  X-direction.      In  order  to  prevent  these  factors   from  introducing 
an  anisotropism  in  the  response  of  the  blob  detector,   a  correction  was 
applied  to  the  parameters   IRADLB  and  IRADUB,  which   are   assumed  to  be   given 
in  terms   of  the  X-direction.      The   correction  for  diagonal  distance  was 
1.1+2 /SQRT (1.1+2**2  +  1.95**2)   =    .59,   and  for  vertical   (Y-direction) 
distance  was   equal  to  1.1+2/1.95   =    .73. 

Three  different   raster  distance  values  were  established. 
on  the  basis  of  information  in  Table  2: 

Cell  Nucleus  Raster  Distance 

Condition  Radius    (   m)  (X-Direction) 

Dysplasia  7.1+  5 

Ca  in   situ  6.0  1+ 

Kerat.    Ca  5.0  1+ 

Non-kerat.    Ca  5.3  1+ 

Small-cell  Ca  k.  3 

where  the  X-direction  raster  distance   is   obtained  by  dividing  the  radius 
in  ym  by  1.1+2,   rounding,   and  taking  the  integer  part.      Subroutine  BLOB 
was   applied  three  times  to  the  input  image  shown  in  Figure  9(b),  with 
parameters   set  each  time  to  detect  one  of  the  size  ranges   corresponding 
to  an  atypical  cell  class   of  Table   2.      INTOL  and  IDROP  were   set  to   eight 
and  three,   respectively,    for  all  iterations.     When  the  three  output  planes 
were  ORed  together,  129  hits   resulted  of  which  82  were   judged   (by  looking 
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at  the  original  photomicrograph)  to  be  clumps  of  white  cells  and  h"J   were 
judged  not  to  be  clumps  of  white  cells.   Figure  11(a)  shows  these  129 
hits  circled.   IRADLB  and  IRADUB  were  set  equal  to  one  for  the  clump  mask, 
since  at  the  low  resolution,  only  one  picture  element  per  leukocyte  is  at 
maximum  density.   Values  for  INTOL  and  IDROP  were  determined  experimentally 
by  varying  INTOL  and  IDROP  and  observing  the  overall  error  rate.   Table  6 
shows  performance  of  the  masking  process  on  the  training  picture  as  a 
function  of  INTOL  and  IDROP,  where  performance  is  measured  by  counting  the 
number  of  big-blob-detector  hits  which  were  visibly  (in  the  original 
photomicrograph)  caused  by  clumps  of  leukocytes  and  noting  the  rate  at  which 
these  are  successfully  rejected  by  the  masking  procedure.   An  effort  was 
made  to  eliminate  effects  of  context  on  this  subjective  decision  by  viewing 
the  objects  through  an  aperture.   A  "False  alarm"  is  a  failure  to  mask 
off  a  clump,  while  a  "miss"  is  the  masking  of  a  non-clump.   Note  that  the 
major  consideration  in  this  experiment  was  the  false  alarm  rate  caused  by 
leukocyte  clumping.   Thus,  response  by  the  blob  detector  when  presented 
with  other  objects  was  not  pursued,  i.e.,  the  big  blob  parameters  and  the 
consequent  hit  rate  were  assumed  to  be  given,  and  the  problem  was  defined 
as  minimization  of  the  overall  error  rate  by  adjusting  the  clump  mask 
parameters.   As  indicated  by  Table  6,  the  optimal  values  of  INTOL  and  IDROP 
were  found  to  be  INT0L=7  and  IDR0P=1  for  this  training  image,  resulting  in 
a  false  alarm  probability  of  .07,  a  miss  probability  of  .66,  and  an  overall 
error  probability  of  .29.   Figure  11(b)  shows  the  effect  of  masking  big 
blob  hits  using  the  optimal  mask  parameter  values. 
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Table  6.   Blob-detector  Performance  as  a  Function  of 
Small-blob  Parameter  Values 

(Clumps  of  leukocytes  are  considered  noise,  other  blobs 
larger  than  individual  leukocytes  are  considered  signal.)   Figures  11(a)' 
(b)  display  processing  using  the  optimal  values  of  INTOL  =  7>  IDROP  =  1. 

Parameters  Performance 
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B.   Cell  Detection 

As  described  in  Chapter  II,  images  of  superficial  cells  can 
become  extremely  complex.   It  is  therefore  necessary  to  determine 
precisely  the  data  to  be  extracted  so  that  unnecessary  processing  is 
avoided.   In  the  present  work,  it  was  decided  to  try  to  locate  fairly 
well-preserved  superficial  cells  which  might  be  touching  or  overlapping 
each  other,  but  not  badly  crumpled  or  contained  in  thick  cell  masses. 
The  cells  to  be  detected  would  be  the  same  ones  a  human  microscopist 
could  readily  count. 

It  was  felt  that  it  should  not  be  necessary  to  perform  detailed 
image  segmentation  to  accomplish  this,  and  in  fact,   that  this  would  be 
impossible  to  do  in  the  low-resolution  images  contemplated.   Other  workers 
[177,  106]  have  described  algorithms  to  handle  touching  or  overlapping 
cells,  but  usually  postulate  an  idealized  input  image  in  which  edges 
are  unambiguously  determined  and  noise-free.   As  can  be  seen  in  Figure  95 
this  is  far  removed  from  reality,  and  the  algorithms  would  be  useless 
if  applied  to  the  output  of  any  conceivable  preprocessing  scheme  operating 
on  real  images  in  real  time.   Precise  delineation  of  edges  is  not  required 
for  the  present  purpose  since  no  measurements  are  taken.   The  presence 
of  a  cell  in  a  digitized  image  is  established  by  applying  a  crude 
structural  model  of  a  flattened  epithelial  cell  and  reporting  a  hit 
where  there  is  sufficient  match,  an  approach  which  is  similar  to  that 
taken  in  subroutine  BLOB.   In  this  case,  the  model  must  be  somewhat  more 
complex  so  as  to  account  for  the  greater  range  of  possible  configurations 
of  the  complete  cells. 
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Processing  is  based  on  the  model  of  the  cell  image  described 
previously  in  Chapter  II. B. 2,  in  which  the  cells  of  interest  are  composed 
of  a  region  of  fairly  consistent  optical  density  (the  cytoplasm)  with  a 
darker  "blob  (the  nucleus)  more  or  less  in  the  center,  as  in  Figure  3. 
Cell-detection  proceeds  by  attempting  to  find  blobs  which  are  approxi- 
mately centered  between  pairs  of  "step-down"  edges.   A  step-down  edge 
is  an  edge  oriented  orthogonal  to  a  line  radiating  from  a  blob  point, 
where  the  gray-value  changes  in  the  negative  direction  as  the  line  crosses 
the  edge.   In  principle,  this  process   is  insensitive  to  overlapping  of 
cytoplasms,  since  the  requirement  for  equally-distant  negative-going 
edges  will  associate  edge  pairs  with  the  correct  nucleus,  unless  either 
(l)  the  cells  are  nearly  coincident,  in  which  case  the  distance  between 
the  nuclei  may  be  less  than  the  tolerance  established  for  the  equal- 
distance  criterion  or  (2)  the  cells  have  off-center  nuclei.   In  case  (2), 
disambiguation  may  still  be  possible  where  the  nucleus  is  centered  in 
at  least  one  direction. 

The  present  implementation  of  this  procedure  takes  the  form  of 
five  PAX  II  subroutines:   AVG,  DIRDIF,  RIDGE,  RADAR  and  BLOB.   AVG  performs 
a  digital  low-pass  filtering  operation  which  is  intended  to  decrease  the 
effect  of  small  detail  and  noise  on  the  edge  computation.   AVG  uses  an 
efficient  parallel  algorithm  described  by  Rosenfeld  and  Thurston  [l82], 
in  which  the  gray  value  of  each  picture  element  is  replaced  by  the 
average  gray-value  in  an  n  x  n  square  neighborhood  of  the  element,  when 
n  is  a  power  of  2.   AVG  does  this  computation  in  parallel  using  arith- 
metic of  n*2+U  bits  precision.   The  optimal  size  of  n  is  a  function  of 
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the  "conspicuousness"  (Rosenfeld's  term)  of  the  edge  to  be  detected.   For 
this  application,  n  was  established  by  trial-and-error  using  the  inter- 
active facilities  of  Show-and-Tell.   Subroutines  DIRDIF  and  RIDGE  cooper- 
ate to  reduce  the  blurred  image  to  a  binary  plane  with  ones  marking  the 
location  of  edges  orthogonal  to  a  given  direction.   DIRDIF  performs  a 
directional  differencing  operation  on  the  AVG  output  by  subtracting  it 
from  a  shifted  copy,  where  the  shift  is  in  the  idrection  of  interest  and 
has  magnitude  n  (the  size  of  the  side  of  the  averaging  area).   RIDGE 
detects  ridges  (local  maxima  occurring  in  chains)  in  the  difference  pic- 
ture and  eliminates  spurious  maxima  not  aligned  orthogonally  to  the  given 
direction.   The  sign  of  the  difference  is  available  for  use  in  establishing 
whether  an  edge  is  negative-going  with  respect  to  the  given  direction. 

Subroutine  BLOB  is  pressed  into  service  again  to  detect  the 
leukocyte-sized  cell  nuclei  used  as  reference  points  for  the  cell-finding 
operation.   In  order  to  determine  which  of  the  blob  points  is  centered 
between  negative-going  edges,  another  subroutine,  RADAR,  is  used. 
RADAR  causes  each  blob  point  to  become  a  sort  of  Radar  transmitter. 
The  edge  points  for  one  sign  and  direction  are  the  targets.   Using  CONNECT, 
the  propagation  operation  of  the  PAU,  a  spreading  beam  of  ones  is  propagated 
in  a  given  direction  (see  Figures  10(b)  and  12(e)).   Edge  points  caught  in 
the  beam  are  identified  by  ANDing  the  beam  with  the  edge  point  plane 
at  each  stage  of  propagation  of  the  beam.   These  edge  points  are  caused 
to  reflect  a  signal  back  to  the  blob  "transmitter"  along  the  beam,  again 
by  using  CONNECT.   A  plane  containing  the  blob  points  which  received 
return  signals  at  each  propagation  stage  is  stored.   RADAR  is  then 
re-applied  in  the  opposite  direction  with  edges  of  opposite  sign  as 
targets.   The  output,  then,  of  RADAR  applied  in  a  pair  of  directions  is 
two  stacks  of  planes  containing  coded  range  information.   Distance  is 
represented  by  the  position  of  the  plane  in  the  stack,  and  on  bits  in  a 
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plane  indicate  which  blob  points  received  echoes  at  that  distance.   If 
successive  pairs  of  planes,  one  from  each  stack,  are  ANDed,  and  the 
result  is  a  plane  having  a  bit  on_  for  each  blob  found  to  be  approximately- 
centered  between  a  pair  of  step-down  edges,  the  number  of  edge-pairs 
discovered  for  each  blob  can  be  used  as  a  measure  of  "the  degree  of  "ill- 
formedness"  of  the  cells.   Cells  detected  at  an  ill-formedness  index 
of  two  are  shown  in  Figure  12(f). 
C.   Texture  Processing 

Drying  artifact  can  cause  large,  dark  blobs  in  cervical  smears, 
which  must  be  distinguished  from  cell  nuclei  to  avoid  excessive  false 
positive  reports.   Drying  artifact  occurs  when  cells  are  not  fixed  promptly 
upon  being  spread  on  the  slide  and  looks  like  the  large  masses  featured 
in  Figure  13(a).   Chromatin  is  the  result  of  staining  the  genetic  material 
in  the  cell  nucleus.   When  the  chromatin  shows  a  texture  consisting  of 
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Figure  10(a).   "BLOB"  Operation.   Three  Gray  Levels. 
IRADUB  =  2,  IRADLB  =  1,  IDROP  =  2 
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large  dark  clumps  it  is  said  to  show  an  active  chromatin  texture.   This 
condition  can  occur  in  nonmalignant  cells,  but  is  much  more  pronounced 
in  the  case  of  cancer.   In  Figure  13(h),  cell  nuclei  with  active  chro- 
matin are  approximately  centered  in  the  photograph.   Note  that  the 
drying  artifact  is  a  somewhat  glassy  material  containing  refractile  and 
light-absorbing  areas,  while  the  chormatin  textures  are  mostly  composed 
of  light-absorbing  areas  and  are  in  most  cases  somewhat  less  contrasty 
than  the  artifact. 

To  test  the  applicability  of  the  varivalued  logic  approach 
(Section  IV. D)  to  this  problem,  sample  textures  were  acquired  by  select- 
ing ik   chromatin  and  13  artifact  samples  from  5  different  Pap  smears. 
In  some  cases  the  same  cell  was  sampled  more  than  once.   An  attempt  was 
made  to  sample  textures  that  appeared  to  the  eye  to  contain  some  local 
texture  information  and  to  avoid  texture  regions  that  would  obviously 
need  contextual  data  for  discrimination.   For  example,  the  chromatin  in 
some  cell  nuclei  is  so  condensed  as  to  present  an  opaque  dark  blob.   It 
would  make  no  sense  to  try  to  separate  these  on  the  basis  of  local  tex- 
ture. 

The  27  samples  contained  32  x  32  picture  elements  each,  with  the 

2 
sample  covering  an  area  of  perhaps  100  urn  on  the  microscope  slide.   The 

gray  values  were  quantized  initially  to  16  equally  spaced  gray  levels. 

This  was  reduced  subsequently  to  h   equally  spaced  gray  levels  to  reduce 

processing  costs. 

Nine  each  of  the  chromatin  and  artifact  texture  samples  were 

designated  the  training  set  and  the  rest  were  set  aside  as  an  unknown  or 

test  set.   The  chromatin  texture  in  considered  to  be  T  and  the  artifact 

isT°. 
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A  3  x  2  neighborhood  was  used,  so  there  was  a  total  of  30  x  31  x  18 
=  l6,TU0  events  in  the  training  set.   Conditional  probabilities  and  likeli- 
hood ratios  were  computed  and  the  ROC  curve  labeled  D  in  Figure  13(f) 
was  generated  as  in  the  one-dimensional  example.   According  to  the  ROC, 
if  the  decision  rule  "decide  chromatin  if  LR(e)>l"  is  used,  one  can 
expect  that  when  a  piece  of  chromatin  texture  is  presented  to  the  cate- 
gorizer  67  percent  of  the  events  will  be  labeled  "chromatin"  and  when 
artifact  is  presented  5U  percent  of  the  events  will  be  incorrectly 
labeled  chromatin.   This  information  is  then  used  to  set  a  threshold  to 
classify  texture  regions,  in  this  case  the  patches  of  texture  in  the 
samples.   (Note  that  we  were  previously  classifying  local  neighborhoods. 
Now  we  are  classifying  regions  in  a  digitized  picture.)   The  training  set 
events  were  labeled  using  the  LR(e)>l  decision  fule  and  the  chromatin- 
labeled  events  (hits)  were  counted  for  each  patch  of  texture.   The  sample 
patches  were  then  classified  as  T   (chromatin)  if  there  were  more  than 
650  hits  in  a  patch  and  T  otherwise.   There  was  not  complete  separation: 
one-third  of  the  chromatin  texture  patches  were  called  artifact  and  11 
percent  of  the  artifact  patches  were  called  chromatin,  a  misclassification 
rate  of  about  22  percent  on  the  training  set. 

Likelihood  ratios  computed  on  the  training  set  were  used  to 
assign  the  training  set  events  to  T  or  T  ,  using  the  LR>1  criterion. 
These  events  were  fed  to  the  A  algorithm  and  an  interval  covering  which 
required  31  intervals  was  generated.   When  this  was  applied  as  a  local 
operation  to  the  unknown  texture  sample  set  (Figure  13(c)),  the  binary 
result  was  as  shown  in  Figure  13(d).   Black  spots  in  Figure  13(d)  indi- 
cate that  the  neighborhood  in  the  vicinity  of  the  spot  was  categorized  as 
being  T  by  the  local  operation.   These  spots  were  counted  for  each 
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Texture  patch,  as  shown  in  Figure  13(d),  and  the  650  threshold  was 
used  to  classify  the  samples.   Figure  13(e)  shows  how  the  program 
classified  each  sample.   In  the  case  of  the  unknown  set,  only  one 
error  was  made. 
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Figure  13(c).   Test  Set  of  Texture  Samples. 
Top  Row  and  First  Two  in  Second  Row  Are  T1 
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Figure  13(d).   Test  Set  With  Hit  Counts 
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Figure  13(e).   Test  Set  Classified 
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Figure  13(f).   Receiver  Operating  Characteristic  for  Chromatin  vs.  Artifact 

Curve  A:  lxl  Neighborhood;  B:  1x2;  C:  2x2;  and  D:  3x2  Neighborhood. 
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Figure  13(g).  Receiver  Operating  Characteristic  with 
Tapered  Quantization  of  Samples 
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VI.   CONCLUSIONS 
A.   Blob  and  Cell  Detection 

The  performances  of  the  blob  and  cell  detectors  indicate  that 
this  capability  for  more  complex  shape  analysis  has  the  potential  to  over- 
come the  immediate  problems  which  have  prevented  development  of  a  Cyto- 
analyzer-like  instrument. 

The  images  used  in  testing  the  programs  were  of  approximately  the 
level  of  complexity  one  would  expect  in  a  mechanically  dispersed  cell 
sample,  such  as  the  one  developed  for  Cytoanalyzer .   Therefore,  it  is 
reasonable  to  predict  that  a  processor  incorporating  the  algorithms 
described  above  could  examine  a  dispersed  cell  sample , counting  well-differ- 
entiated epithelial  cells ,  and  detecting  enlarged  blobs  while  ignoring  white 
cell  clumps,  two   processes  which  were  essential  to  the  Cytoanalyzer ' s 
planned  mode  of  operation,  but  which  proved  to  be  technically  infeasible. 
If  the  Cytoanalyzer-format  slide  is  used  (l   cm  x  5  cm),  approximately 
12,500  fields  of  200  urn  x  200  urn  would  have  to  be  processed.   To  do  this  in 
four  minutes  (approximately  the  time  required  by  human  screeners),  pro- 
cessing rate  would  have  to  approach  fifty  frames  per  second.   This  frame 
rate  is  well  within  the  realm  of  possibility  for  available  television 
cameras,  analog-to-digital  converters,  and  memories  at  the  resolution  and 
signal-to- noise  capacities  required  by  the  blob-filtering  and  cell  detecting 
algorithms  discussed  here.   Processing  each  frame  with  the  algorithms 
described  required  991  fundamental  parallel  array  operations  (Boolean 
operations  between  planes,  unit  shifts  of  planes)  and  U8U  more  complex 
array  operations  (addition,  subtraction  and  comparison  of  stacks  of  planes 
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interpreted  as  arrays  of  binary  numbers).   To  estimate  time  requirements, 
it  is  assumed  that  each  complex  operation  takes  six  times  as  long  as  a 
fundamental  operation.   This  rule-of- thumb  was  a  result  of  observing  that 
integrated-circuit  arithmetic/logic  units  take  about  six  times  as  long  to 
perform  an  operation  as  simple  gates  take.   Using  this  information  and 
assuming  that  scanning  can  be  completely  overlapped  with  processing  so 
that  all  20  ms  are  available—it  can  be  calculated  that  5.1  ys  are 
available  for  each  fundamental  operation.   This  execution  time  is  clearly 
attainable  within  the  present  state-of-the-art  of  circuit  technology. 

B.   Texture  Analysis 

In  view  of  the  success  of  recent  studies  in  identifying  cancer 
cells  through  texture  features  [125,  18?]  and  also  in  view  of  the  large 
weight  commonly  placed  on  chromatin  texture  by  diagnosticians  [52],  it 
seems  clear  that  addition  of  a  texture-analyzing  capability  would  greatly 
enhance  the  effectiveness  of  an  image-processing  approach  to  automated 
cancer  cell  detection. 

The  experiments  reported  here  demonstrated  that  texture  informa- 
tion permits  a  greater  degree  of  discrimination  between  chromatin  and  arti- 
fact than  is  obtainable  from  intensity  data  alone.   However,  a  degree  of 
ambiguity  is  not  resolved  by  the  present  procedure.   Figure  13(h)  shows  the 
distribution  of  hit  count  values  (as  calculated  in  Section  V.C. )  for  the 
sample  texture  patches  in  the  training  set  and  in  the  unknown  set,  a  total 
of  27  observations.   While  there  are  too  few  samples  to  provide  a  very 
reliable  picture  of  the  actual  distributions,  it  seems  clear  that  there  are 
two  populations  and  that  they  overlap.   There  are  several  parameters  that 
can  be  adjusted  to  try  to  improve  separation:   sampling  resolution  in  the 
initial  scan;  scanning  beam  wave  length;  local  neighborhood  size  and  con- 
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figuration;  and  quantization  scheme.   For  example,  Figure  13(f)  shows  the 
result  of  changing  the  local  neighborhood  configuration  from  1  x  1  to 
3x2.   The  hit  probability  (for  the  LR  >  1  criterion)  goes  from  approximately 
55  percent  for  50  percent  false  alarms  to  67  percent  for  5h   percent  false 
alarms,  i.e.,  the  hit  rate  increased  12  percent  while  the  false  alarm  rate 
increased  h   percent.   The  same  l6-level  texture  patches  were  used  in  the 
generation  of  Figure  (13(g);  however,  an  alternative  quantization  scheme  was 
applied.   Instead  of  quantizing  four  gray-value  ranges  of  equal  width,  a 
"tapered"  quantization  was  used  [l8l].  With  this  method,  the  gray-value 
ranges  are  adjusted  so  that  approximately  the  same  number  of  picture 
elements  will  have  each  of  the  (quantized)  values,  or  to  put  it  another 
way,  each  gray  value  will  occur  with  equal  probability  in  the  quantized 
picture.   This  has  the  effect  of  increasing  detail  in  large  areas  of 
low-amplitude  high-frequency  modulation.   The  quantization  ranges  were 
set  once  by  using  a  composite  gray-value  histogram  derived  from  all 
training  set  samples.  As  can  be  seen  by  looking  at  the  coordinate  of  point  D 
in  Figure  1Mb )  for  the  LR>1  .  decision  rule  and  using  a  3  x  2  local  neigh- 
borhood, a  58  percent  hit  rate  can  be  achieved  at  the  expense  of  38  per- 
cent false  alarms,  a  separation  of  23  percent  which  is  an  improvement 
over  the  13  percent  separation  observed  with  the  equal-range-width  quantiz- 
ing of  Figure  13(f).   By  systematically  adjusting  parameters  in  this  way  and 
monitoring  expected  performance  via  the  ROC,  it  is  possible  to  arrive  at 
an  interval  cover  with  optimal  effectiveness  for  this  application. 
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Figure  13(h).   Hit  Count  Distribution 
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C.   Recommendations  for  Further  Work 

The  results  of  these  experiments,  while  encouraging,  indicate 
that  much  work  needs  to  be  done  in  adjusting  the  system  parameters  before 
an  acceptable  performance  level  can  be  achieved.   Specifically,  studies 
using  the  protocols  described  above  should  be  conducted  on  large  samples  of 
mechanically-dispersed  cells  so  that  optimal  determinations  can  be  made 
for  variables  such  as  scanning  wavelength  and  photometric  and  spatial 
resolution,  big-blob  detector  parameters,  and  cell  detector  parameters. 
To  do  this  economically,  the  programs  must  be  converted  from  PAX  II  to 
PAU  machine  language  so  that  existing  dedicated  image-processing  facilities 
can  be  used.   The  IBM  360  is  much  too  expensive  to  be  used  for  large-scale 
runs.   PAX  II  is  at  a  slightly  higher  level  than  PAU  machine  language,  so 
the  mapping  is  not  one-to-one.   However,  the  basic  structure  of  the 
algorithm  will  survive  the  conversion. 

It  seems  clear  that  texture,  as  well  as  colorimetric  data, 
should  be  much  more  extensively  investigated  as  a  cell  characterizer  both 
for  cancer  screening  and  for  other  cytological  image  analysis  applications. 
The  processing  of  color  data  has  been  well  investigated  in  connection 
with  analysis  of  multispectral  images  and  data  generated  by  remote  sensing 
devices  such  as  carried  by  satellites  or  aircraft.   Multispectral  data  is 
formatted  as  a  multiple-component  vector  with  each  component  representing 
detected  energy  integrated  over  some  portion  of  the  electromagnetic  spec- 
trum.  The  varivalued  logic  approach  can  be  used  to  define  efficient  color 
vector  categorizers  in  the  same  manner  as  in  the  texture  case.   For  the 
cytology  application,  it  is  essential  that  the  scanning  equipment  used  be 
photometrically  calibrated  so  that  cytochemical  techniques  can  be  employed 
in  devising  and  evaluating  cell  analysis  programs. 
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Finally,  other  applications  requiring  the  extraction  of  quantita- 
tive information  from  large  numbers  of  cells  should  be  considered.   Two 
particularly  interesting  applications  are  in  the  quantitative  study  of 
neuron  depopulation  and  in  cell  motility  investigations.   The  counting 
of  neurons  in  sectioned  brains  is  a  compelling  candidate,  since  shape 
analysis  is  needed  to  distinguish  the  neurons  from  glial  cells  and  because 
there  is  a  high  data  rate  requirement.   The  blob-detecting  program  should 
be  transferable  to  the  neuron  counting  problem.   Motility  studies  involv- 
ing location  and  tracking  of  moving  cells  or  small  organisms  in  movie 
film  or  from  real-time  television  scanning  is  another  application  requir- 
ing rapid  blob  location.   Tracking  individual  cells  in  cell  cultures 
could  be  of  use  in  following  the  life  history  of  specific  cells  being  sub- 
jected to  controlled  stresses,  e.g.,  presence  of  carcinogenic  substances. 
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PUPFCSE:  TC  LCCATE  ALL  ECCES  IN  OIRECTICN  ICIR  FPCN  PCINTS 
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INTICATEC  RY  CNFS  IN  PLANE  PEOGE.   SC^IST  HAS  FCUR  PLANES,  SC 
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